2025-08-14T21:18:23.7726776Z Current runner version: '2.327.1' 2025-08-14T21:18:23.7732085Z Runner name: 'pytorch-rocm-hw-43' 2025-08-14T21:18:23.7732790Z Runner group name: 'linux.rocm.gpu.group' 2025-08-14T21:18:23.7733639Z Machine name: 'pytorch-rocm-hw-43' 2025-08-14T21:18:23.7736068Z ##[group]GITHUB_TOKEN Permissions 2025-08-14T21:18:23.7737785Z Contents: read 2025-08-14T21:18:23.7738271Z Metadata: read 2025-08-14T21:18:23.7738794Z ##[endgroup] 2025-08-14T21:18:23.7740671Z Secret source: Actions 2025-08-14T21:18:23.7741350Z Prepare workflow directory 2025-08-14T21:18:24.1565109Z Prepare all required actions 2025-08-14T21:18:24.1603850Z Getting action download info 2025-08-14T21:18:24.7408044Z Download action repository 'pytorch/pytorch@main' (SHA:3be70dc30e893b552fc0f23ca06cd8f7949b6d08) 2025-08-14T21:18:30.6538436Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-08-14T21:18:31.2565879Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-08-14T21:18:31.8427630Z Download action repository 'pytorch/test-infra@main' (SHA:83f58f391e939c10dcb8cb6d745e4cefa3b98a84) 2025-08-14T21:18:33.2358123Z Download action repository 'actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-08-14T21:18:34.0995280Z Getting action download info 2025-08-14T21:18:34.2716134Z Download action repository 'actions/checkout@v4' (SHA:08eba0b27e820071cde6df949e0beb9ba4906955) 2025-08-14T21:18:34.9125701Z Getting action download info 2025-08-14T21:18:35.1003880Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-08-14T21:18:35.6936744Z Getting action download info 2025-08-14T21:18:35.8860395Z Uses: pytorch/pytorch/.github/workflows/_rocm-test.yml@refs/heads/main (1fc683cf17c8c673044538d10266c00f92987be2) 2025-08-14T21:18:35.8866417Z ##[group] Inputs 2025-08-14T21:18:35.8866933Z build-environment: linux-jammy-rocm-py3.10 2025-08-14T21:18:35.8869109Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.2"}]} 2025-08-14T21:18:35.8871714Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:35.8872682Z sync-tag: 2025-08-14T21:18:35.8873792Z timeout-minutes: 300 2025-08-14T21:18:35.8874164Z tests-to-include: 2025-08-14T21:18:35.8874500Z dashboard-tag: 2025-08-14T21:18:35.8875581Z disable-monitor: true 2025-08-14T21:18:35.8875992Z monitor-log-interval: 5 2025-08-14T21:18:35.8876414Z monitor-data-collect-interval: 1 2025-08-14T21:18:35.8876820Z ##[endgroup] 2025-08-14T21:18:35.8877353Z Complete job name: linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:18:36.0096742Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-08-14T21:18:36.0097343Z with: 2025-08-14T21:18:36.0129513Z no-sudo: true 2025-08-14T21:18:36.0129746Z submodules: recursive 2025-08-14T21:18:36.0129957Z fetch-depth: 0 2025-08-14T21:18:36.0130495Z env: 2025-08-14T21:18:36.0130717Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:36.0130913Z ##[endgroup] 2025-08-14T21:18:36.0202988Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-08-14T21:18:36.0203724Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-08-14T21:18:36.0233215Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:36.0233508Z env: 2025-08-14T21:18:36.0233660Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:36.0233851Z ##[endgroup] 2025-08-14T21:18:36.0547392Z ##[group]Run # Use all available CPUs for fetching 2025-08-14T21:18:36.0548195Z # Use all available CPUs for fetching 2025-08-14T21:18:36.0548797Z cd "${GITHUB_WORKSPACE}" 2025-08-14T21:18:36.0549378Z git config --global fetch.parallel 0 2025-08-14T21:18:36.0550017Z git config --global submodule.fetchJobs 0 2025-08-14T21:18:36.0550698Z  2025-08-14T21:18:36.0551291Z # Clean workspace. The default checkout action should also do this, but 2025-08-14T21:18:36.0552065Z # do it here as well just in case 2025-08-14T21:18:36.0552595Z if [[ -d .git ]]; then 2025-08-14T21:18:36.0553082Z  if [ -z "${NO_SUDO}" ]; then 2025-08-14T21:18:36.0553592Z  sudo git clean -ffdx 2025-08-14T21:18:36.0554072Z  else 2025-08-14T21:18:36.0554464Z  git clean -ffdx 2025-08-14T21:18:36.0554886Z  fi 2025-08-14T21:18:36.0555243Z fi 2025-08-14T21:18:36.0607097Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:36.0607769Z env: 2025-08-14T21:18:36.0608126Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:36.0608545Z NO_SUDO: true 2025-08-14T21:18:36.0608905Z ##[endgroup] 2025-08-14T21:18:36.5986328Z Removing .additional_ci_files/ 2025-08-14T21:18:36.5987278Z Removing build/ 2025-08-14T21:18:36.5987735Z Removing dist/ 2025-08-14T21:18:36.5988230Z Removing test/test-reports/ 2025-08-14T21:18:36.6112111Z ##[group]Run actions/checkout@v4 2025-08-14T21:18:36.6112653Z with: 2025-08-14T21:18:36.6113077Z ref: 1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:18:36.6113645Z fetch-depth: 0 2025-08-14T21:18:36.6114052Z submodules: recursive 2025-08-14T21:18:36.6114470Z show-progress: false 2025-08-14T21:18:36.6114941Z repository: pytorch/pytorch 2025-08-14T21:18:36.6115649Z token: *** 2025-08-14T21:18:36.6116025Z ssh-strict: true 2025-08-14T21:18:36.6116396Z ssh-user: git 2025-08-14T21:18:36.6116832Z persist-credentials: true 2025-08-14T21:18:36.6117322Z clean: true 2025-08-14T21:18:36.6117750Z sparse-checkout-cone-mode: true 2025-08-14T21:18:36.6118245Z fetch-tags: false 2025-08-14T21:18:36.6118610Z lfs: false 2025-08-14T21:18:36.6118976Z set-safe-directory: true 2025-08-14T21:18:36.6119391Z env: 2025-08-14T21:18:36.6119736Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:36.6120145Z ##[endgroup] 2025-08-14T21:18:36.7301188Z Syncing repository: pytorch/pytorch 2025-08-14T21:18:36.7303654Z ##[group]Getting Git version info 2025-08-14T21:18:36.7304553Z Working directory is '/home/pytorchci/actions-runner/_work/pytorch/pytorch' 2025-08-14T21:18:36.7305824Z [command]/usr/bin/git version 2025-08-14T21:18:36.7306306Z git version 2.34.1 2025-08-14T21:18:36.7307812Z ##[endgroup] 2025-08-14T21:18:36.7314127Z Copying '/home/pytorchci/.gitconfig' to '/home/pytorchci/actions-runner/_work/_temp/51ca06da-ec88-46fb-a40a-71b772319a90/.gitconfig' 2025-08-14T21:18:36.7316649Z Temporarily overriding HOME='/home/pytorchci/actions-runner/_work/_temp/51ca06da-ec88-46fb-a40a-71b772319a90' before making global git config changes 2025-08-14T21:18:36.7318351Z Adding repository directory to the temporary git global config as a safe directory 2025-08-14T21:18:36.7319655Z [command]/usr/bin/git config --global --add safe.directory /home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-08-14T21:18:36.7321597Z [command]/usr/bin/git config --local --get remote.origin.url 2025-08-14T21:18:36.7326203Z https://github.com/pytorch/pytorch 2025-08-14T21:18:36.7343721Z ##[group]Removing previously created refs, to avoid conflicts 2025-08-14T21:18:36.7347054Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-08-14T21:18:36.7384583Z HEAD 2025-08-14T21:18:36.7442764Z ##[endgroup] 2025-08-14T21:18:36.7446615Z [command]/usr/bin/git submodule status 2025-08-14T21:18:36.7853997Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-08-14T21:18:36.7935936Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-08-14T21:18:36.8061228Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-08-14T21:18:36.8210322Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-08-14T21:18:36.8302621Z 2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07 third_party/NVTX (v3.1.0-263-g2942f16) 2025-08-14T21:18:36.8438210Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-08-14T21:18:36.9032858Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-08-14T21:18:36.9097029Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-08-14T21:18:36.9145080Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-08-14T21:18:36.9294692Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-08-14T21:18:36.9524599Z 3af7f2c16147f3fbc6e4d717032daf505dc1652c third_party/cpp-httplib (v0.20.1) 2025-08-14T21:18:36.9727028Z 5e3d2445e6a84d9599bee2bf78edbb4d80865e1d third_party/cpuinfo (5e3d244) 2025-08-14T21:18:36.9800693Z f937055efc6d414d11f4c6577e3977fe74f35fb6 third_party/cudnn_frontend (v0.5-52-gf937055) 2025-08-14T21:18:36.9964676Z e51efbfe18fe4f4cbb66ab814c55bf4aa0185491 third_party/cutlass (v4.1.0) 2025-08-14T21:18:37.0052108Z 21c7d30c526c0f1ad873ecc632dca6cfa8a69067 third_party/fbgemm (v1.3.0-rc1-165-g21c7d30c) 2025-08-14T21:18:37.0194759Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-08-14T21:18:37.0248311Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-08-14T21:18:37.0729603Z 40626af88bd7df9a5fb80be7b25ac85b122d6c21 third_party/fmt (11.2.0) 2025-08-14T21:18:37.0919776Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-08-14T21:18:37.1133903Z c7b7b022c124d9643957d9bd55f57ac59fce8fa2 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-33-gc7b7b02) 2025-08-14T21:18:37.1447594Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-08-14T21:18:37.1595741Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-08-14T21:18:37.1692267Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-08-14T21:18:37.2129964Z 5e7501833f1021ce6f618572d3baf657b6319658 third_party/kineto (remotes/origin/sraikund/test-98-g5e75018) 2025-08-14T21:18:37.2177692Z cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7 third_party/kleidiai (v1.8.0) 2025-08-14T21:18:37.2209684Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-08-14T21:18:37.2261733Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-08-14T21:18:37.2700363Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-08-14T21:18:37.2730252Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-08-14T21:18:37.2790333Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-08-14T21:18:37.3137410Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-08-14T21:18:37.3299411Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-08-14T21:18:37.3401547Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-08-14T21:18:37.3485334Z a2e59f0e7065404b44dfe92a28aca47ba1378dc4 third_party/pybind11 (v2.11.0-182-ga2e59f0e) 2025-08-14T21:18:37.3610053Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-08-14T21:18:37.3753737Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-08-14T21:18:37.3887261Z dacda0567d9f23d4bc503e1c4f84aa65f33ac38a third_party/tensorpipe (heads/main) 2025-08-14T21:18:37.3912854Z ##[group]Cleaning the repository 2025-08-14T21:18:37.3920205Z [command]/usr/bin/git clean -ffdx 2025-08-14T21:18:37.4331938Z [command]/usr/bin/git reset --hard HEAD 2025-08-14T21:18:37.5188007Z HEAD is now at 65053c03a3d [FR] Don't check incomplete ranks for printing (#160195) 2025-08-14T21:18:37.5218697Z ##[endgroup] 2025-08-14T21:18:37.5219552Z ##[group]Disabling automatic garbage collection 2025-08-14T21:18:37.5226114Z [command]/usr/bin/git config --local gc.auto 0 2025-08-14T21:18:37.5288382Z ##[endgroup] 2025-08-14T21:18:37.5289066Z ##[group]Setting up auth 2025-08-14T21:18:37.5298414Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-08-14T21:18:37.5359210Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-08-14T21:18:37.5772949Z Entering 'android/libs/fbjni' 2025-08-14T21:18:37.5834968Z Entering 'third_party/FP16' 2025-08-14T21:18:37.5899824Z Entering 'third_party/FXdiv' 2025-08-14T21:18:37.5953460Z Entering 'third_party/NNPACK' 2025-08-14T21:18:37.6006224Z Entering 'third_party/NVTX' 2025-08-14T21:18:37.6054445Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:37.6106307Z Entering 'third_party/XNNPACK' 2025-08-14T21:18:37.6166974Z Entering 'third_party/aiter' 2025-08-14T21:18:37.6226054Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:37.6285299Z Entering 'third_party/benchmark' 2025-08-14T21:18:37.6327808Z Entering 'third_party/composable_kernel' 2025-08-14T21:18:37.6382940Z Entering 'third_party/cpp-httplib' 2025-08-14T21:18:37.6421656Z Entering 'third_party/cpuinfo' 2025-08-14T21:18:37.6462364Z Entering 'third_party/cudnn_frontend' 2025-08-14T21:18:37.6514288Z Entering 'third_party/cutlass' 2025-08-14T21:18:37.6567818Z Entering 'third_party/fbgemm' 2025-08-14T21:18:37.6614692Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:37.6671286Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:37.6724417Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:37.6768218Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:37.6817972Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:37.6875272Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:37.6929811Z Entering 'third_party/fbgemm/external/json' 2025-08-14T21:18:37.6995024Z Entering 'third_party/flash-attention' 2025-08-14T21:18:37.7038586Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:37.7085415Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:37.7140076Z Entering 'third_party/flatbuffers' 2025-08-14T21:18:37.7197871Z Entering 'third_party/fmt' 2025-08-14T21:18:37.7249242Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:37.7287074Z Entering 'third_party/gloo' 2025-08-14T21:18:37.7330783Z Entering 'third_party/googletest' 2025-08-14T21:18:37.7376243Z Entering 'third_party/ideep' 2025-08-14T21:18:37.7438894Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:37.7492637Z Entering 'third_party/ittapi' 2025-08-14T21:18:37.7531023Z Entering 'third_party/kineto' 2025-08-14T21:18:37.7568798Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:37.7616763Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:37.7671610Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:37.7720572Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:37.7768320Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:37.7815946Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:37.7872837Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:37.7910119Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:37.7954879Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:37.8015311Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:37.8067631Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:37.8103262Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:37.8142539Z Entering 'third_party/kleidiai' 2025-08-14T21:18:37.8182207Z Entering 'third_party/mimalloc' 2025-08-14T21:18:37.8231339Z Entering 'third_party/nlohmann' 2025-08-14T21:18:37.8283588Z Entering 'third_party/onnx' 2025-08-14T21:18:37.8345372Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:37.8399022Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T21:18:37.8438028Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:37.8477362Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:37.8512802Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:37.8548040Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:37.8585460Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:37.8623070Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:37.8658020Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:37.8692867Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:37.8735850Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:37.8773308Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:37.8831472Z Entering 'third_party/pocketfft' 2025-08-14T21:18:37.8880930Z Entering 'third_party/protobuf' 2025-08-14T21:18:37.8923881Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:37.8960275Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:37.9000090Z Entering 'third_party/psimd' 2025-08-14T21:18:37.9040005Z Entering 'third_party/pthreadpool' 2025-08-14T21:18:37.9078657Z Entering 'third_party/pybind11' 2025-08-14T21:18:37.9116607Z Entering 'third_party/python-peachpy' 2025-08-14T21:18:37.9153909Z Entering 'third_party/sleef' 2025-08-14T21:18:37.9198982Z Entering 'third_party/tensorpipe' 2025-08-14T21:18:37.9237960Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:37.9280839Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:37.9319575Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:37.9370901Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:37.9409601Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:37.9476369Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-08-14T21:18:37.9526410Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-08-14T21:18:37.9866976Z Entering 'android/libs/fbjni' 2025-08-14T21:18:37.9933702Z Entering 'third_party/FP16' 2025-08-14T21:18:37.9989768Z Entering 'third_party/FXdiv' 2025-08-14T21:18:38.0034032Z Entering 'third_party/NNPACK' 2025-08-14T21:18:38.0071911Z Entering 'third_party/NVTX' 2025-08-14T21:18:38.0120164Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:38.0182177Z Entering 'third_party/XNNPACK' 2025-08-14T21:18:38.0264676Z Entering 'third_party/aiter' 2025-08-14T21:18:38.0319558Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:38.0391179Z Entering 'third_party/benchmark' 2025-08-14T21:18:38.0447403Z Entering 'third_party/composable_kernel' 2025-08-14T21:18:38.0494345Z Entering 'third_party/cpp-httplib' 2025-08-14T21:18:38.0542428Z Entering 'third_party/cpuinfo' 2025-08-14T21:18:38.0580844Z Entering 'third_party/cudnn_frontend' 2025-08-14T21:18:38.0617567Z Entering 'third_party/cutlass' 2025-08-14T21:18:38.0690509Z Entering 'third_party/fbgemm' 2025-08-14T21:18:38.0755403Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:38.0815591Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:38.0889908Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:38.0954482Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:38.1029905Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:38.1091215Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:38.1145908Z Entering 'third_party/fbgemm/external/json' 2025-08-14T21:18:38.1194628Z Entering 'third_party/flash-attention' 2025-08-14T21:18:38.1231615Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:38.1294025Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:38.1344178Z Entering 'third_party/flatbuffers' 2025-08-14T21:18:38.1393376Z Entering 'third_party/fmt' 2025-08-14T21:18:38.1433607Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:38.1468572Z Entering 'third_party/gloo' 2025-08-14T21:18:38.1515898Z Entering 'third_party/googletest' 2025-08-14T21:18:38.1557381Z Entering 'third_party/ideep' 2025-08-14T21:18:38.1594974Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:38.1638433Z Entering 'third_party/ittapi' 2025-08-14T21:18:38.1692864Z Entering 'third_party/kineto' 2025-08-14T21:18:38.1730325Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:38.1768139Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:38.1819185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:38.1868047Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:38.1906067Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:38.1953027Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:38.2003957Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:38.2043661Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:38.2085605Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:38.2128043Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:38.2189093Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:38.2226870Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:38.2264936Z Entering 'third_party/kleidiai' 2025-08-14T21:18:38.2303993Z Entering 'third_party/mimalloc' 2025-08-14T21:18:38.2349223Z Entering 'third_party/nlohmann' 2025-08-14T21:18:38.2394397Z Entering 'third_party/onnx' 2025-08-14T21:18:38.2448797Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:38.2527320Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T21:18:38.2583622Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:38.2639725Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:38.2686108Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:38.2726171Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:38.2774830Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:38.2820433Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:38.2855462Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:38.2893439Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:38.2943160Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:38.2983912Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:38.3051353Z Entering 'third_party/pocketfft' 2025-08-14T21:18:38.3099609Z Entering 'third_party/protobuf' 2025-08-14T21:18:38.3157371Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:38.3202146Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:38.3247710Z Entering 'third_party/psimd' 2025-08-14T21:18:38.3286791Z Entering 'third_party/pthreadpool' 2025-08-14T21:18:38.3329130Z Entering 'third_party/pybind11' 2025-08-14T21:18:38.3366752Z Entering 'third_party/python-peachpy' 2025-08-14T21:18:38.3409475Z Entering 'third_party/sleef' 2025-08-14T21:18:38.3447240Z Entering 'third_party/tensorpipe' 2025-08-14T21:18:38.3489456Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:38.3528474Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:38.3563306Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:38.3610369Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:38.3662804Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:38.3756788Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-08-14T21:18:38.3819365Z ##[endgroup] 2025-08-14T21:18:38.3820134Z ##[group]Fetching the repository 2025-08-14T21:18:38.3830452Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-08-14T21:18:39.0314454Z From https://github.com/pytorch/pytorch 2025-08-14T21:18:39.0315370Z - [deleted] (none) -> ciflow/inductor/155263 2025-08-14T21:18:39.0968801Z - [deleted] (none) -> ciflow/inductor/159530 2025-08-14T21:18:39.0969612Z - [deleted] (none) -> ciflow/inductor/160000 2025-08-14T21:18:39.0970688Z - [deleted] (none) -> ciflow/inductor/160605 2025-08-14T21:18:39.0971613Z - [deleted] (none) -> ciflow/inductor/160645 2025-08-14T21:18:39.0974553Z - [deleted] (none) -> ciflow/trunk/155263 2025-08-14T21:18:39.0975348Z - [deleted] (none) -> ciflow/trunk/158137 2025-08-14T21:18:39.0976714Z - [deleted] (none) -> ciflow/trunk/159530 2025-08-14T21:18:39.0978287Z - [deleted] (none) -> ciflow/trunk/160000 2025-08-14T21:18:39.0979428Z - [deleted] (none) -> ciflow/trunk/160008 2025-08-14T21:18:39.0980913Z - [deleted] (none) -> ciflow/trunk/160562 2025-08-14T21:18:39.0982020Z - [deleted] (none) -> ciflow/trunk/160605 2025-08-14T21:18:39.0983326Z - [deleted] (none) -> ciflow/trunk/160636 2025-08-14T21:18:39.0984715Z - [deleted] (none) -> ciflow/trunk/160643 2025-08-14T21:18:39.0986090Z - [deleted] (none) -> ciflow/trunk/160645 2025-08-14T21:18:40.8077554Z 78538ad3e12..c4cb9008294 autoupdate-transformers-pin-via-pr -> origin/autoupdate-transformers-pin-via-pr 2025-08-14T21:18:40.8158806Z + 18b763f5271...4bc85c55ba9 exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model (forced update) 2025-08-14T21:18:40.8194319Z * [new branch] gh/eellison/813/base -> origin/gh/eellison/813/base 2025-08-14T21:18:40.8196594Z * [new branch] gh/eellison/813/head -> origin/gh/eellison/813/head 2025-08-14T21:18:40.8201440Z * [new branch] gh/eellison/813/orig -> origin/gh/eellison/813/orig 2025-08-14T21:18:40.8214559Z 03f2c0ce4ea..8dcf14991ff gh/guilhermeleobas/147/head -> origin/gh/guilhermeleobas/147/head 2025-08-14T21:18:40.8216109Z + bcac008b1bc...9a2c169f560 gh/guilhermeleobas/147/orig -> origin/gh/guilhermeleobas/147/orig (forced update) 2025-08-14T21:18:40.8221524Z 6646c3dfa98..e10ecdec4fc gh/guilhermeleobas/214/base -> origin/gh/guilhermeleobas/214/base 2025-08-14T21:18:40.8223483Z a0786e4dfb6..44585d2f482 gh/guilhermeleobas/214/head -> origin/gh/guilhermeleobas/214/head 2025-08-14T21:18:40.8225954Z + 7e281b54e69...9e777070cd7 gh/guilhermeleobas/214/orig -> origin/gh/guilhermeleobas/214/orig (forced update) 2025-08-14T21:18:40.8228480Z 0e2e4700677..c7bdd8517c7 gh/guilhermeleobas/215/base -> origin/gh/guilhermeleobas/215/base 2025-08-14T21:18:40.8231417Z bc418a1823f..c36bdef982a gh/guilhermeleobas/215/head -> origin/gh/guilhermeleobas/215/head 2025-08-14T21:18:40.8233696Z + 339f2ac02fc...65bcf0cf22c gh/guilhermeleobas/215/orig -> origin/gh/guilhermeleobas/215/orig (forced update) 2025-08-14T21:18:40.8235830Z 9f7ab8d153b..37addbd5732 gh/guilhermeleobas/216/base -> origin/gh/guilhermeleobas/216/base 2025-08-14T21:18:40.8238520Z cbb9ebfa7db..d6b43a6cd4e gh/guilhermeleobas/216/head -> origin/gh/guilhermeleobas/216/head 2025-08-14T21:18:40.8240989Z + 3030a7fb87f...0768ffd59f3 gh/guilhermeleobas/216/orig -> origin/gh/guilhermeleobas/216/orig (forced update) 2025-08-14T21:18:40.8243334Z 329c5274318..87732b2c98e gh/guilhermeleobas/217/base -> origin/gh/guilhermeleobas/217/base 2025-08-14T21:18:40.8246151Z a2ed4621e3c..5556c80b778 gh/guilhermeleobas/217/head -> origin/gh/guilhermeleobas/217/head 2025-08-14T21:18:40.8248379Z + c1fdf0419c3...c6b3f286785 gh/guilhermeleobas/217/orig -> origin/gh/guilhermeleobas/217/orig (forced update) 2025-08-14T21:18:40.8250848Z 14a3891c9ab..e4dcf56a427 gh/guilhermeleobas/220/base -> origin/gh/guilhermeleobas/220/base 2025-08-14T21:18:40.8253581Z f9eca810cbb..2f44712cf54 gh/guilhermeleobas/220/head -> origin/gh/guilhermeleobas/220/head 2025-08-14T21:18:40.8255794Z + e6b771a3a3a...77fbab074fc gh/guilhermeleobas/220/orig -> origin/gh/guilhermeleobas/220/orig (forced update) 2025-08-14T21:18:40.8258299Z 60127e9851d..3d434480221 gh/guilhermeleobas/222/base -> origin/gh/guilhermeleobas/222/base 2025-08-14T21:18:40.8261023Z 49a567502c2..347333b3995 gh/guilhermeleobas/222/head -> origin/gh/guilhermeleobas/222/head 2025-08-14T21:18:40.8263292Z + 37072dc6925...9ee0a5f1e67 gh/guilhermeleobas/222/orig -> origin/gh/guilhermeleobas/222/orig (forced update) 2025-08-14T21:18:40.8265684Z 02f81cd7750..72cea7eea1f gh/guilhermeleobas/223/base -> origin/gh/guilhermeleobas/223/base 2025-08-14T21:18:40.8268407Z 7f0483b79d9..64b4dccb023 gh/guilhermeleobas/223/head -> origin/gh/guilhermeleobas/223/head 2025-08-14T21:18:40.8270727Z + b39476f5c91...03b5add0bd3 gh/guilhermeleobas/223/orig -> origin/gh/guilhermeleobas/223/orig (forced update) 2025-08-14T21:18:40.8273020Z cc6f595d52d..2a152d3688b gh/guilhermeleobas/224/base -> origin/gh/guilhermeleobas/224/base 2025-08-14T21:18:40.8275507Z dcba0f797c9..9ba4773891f gh/guilhermeleobas/224/head -> origin/gh/guilhermeleobas/224/head 2025-08-14T21:18:40.8277780Z + a07cf993edd...dbe36858fef gh/guilhermeleobas/224/orig -> origin/gh/guilhermeleobas/224/orig (forced update) 2025-08-14T21:18:40.8305797Z ea5b50f2a44..387d4ae35ce gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-08-14T21:18:40.8309466Z b87cd6dc372..4d057fb84e3 gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-08-14T21:18:40.8314816Z + 16c7cc80222...ac149060558 gh/xmfan/274/orig -> origin/gh/xmfan/274/orig (forced update) 2025-08-14T21:18:40.8317232Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-08-14T21:18:40.8318541Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-08-14T21:18:40.8320136Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-08-14T21:18:40.8324547Z 7f5d2fe3aef..16b50f866ad gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-08-14T21:18:40.8326358Z 97d57cea8ee..14ac4b0cd8e gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-08-14T21:18:40.8328779Z + a54e773a852...196b71c5f3f gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig (forced update) 2025-08-14T21:18:40.8330958Z 4576f5bc30f..66605343d69 gh/yangw-dev/17/base -> origin/gh/yangw-dev/17/base 2025-08-14T21:18:40.8333237Z 497de567228..44feacc104c gh/yangw-dev/17/head -> origin/gh/yangw-dev/17/head 2025-08-14T21:18:40.8335434Z + 5e19e3044b0...cd4ff57277f gh/yangw-dev/17/orig -> origin/gh/yangw-dev/17/orig (forced update) 2025-08-14T21:18:40.8337684Z 4bb8d6ee236..8e0d31dd9a7 gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-08-14T21:18:40.8340198Z ff64e4ec359..7f2192ddf6c gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-08-14T21:18:40.8342218Z + 47aa0c2599e...c241d1d9683 gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig (forced update) 2025-08-14T21:18:40.8345178Z b74d09ea9cb..c6f893ae6a2 gh/ydwu4/253/base -> origin/gh/ydwu4/253/base 2025-08-14T21:18:40.8347141Z 343c95790d8..ec23555a649 gh/ydwu4/253/head -> origin/gh/ydwu4/253/head 2025-08-14T21:18:40.8349347Z + 46a1a2cb3f1...42603577470 gh/ydwu4/253/orig -> origin/gh/ydwu4/253/orig (forced update) 2025-08-14T21:18:40.8352851Z ccedc96dd99..4c947d82af4 gh/ydwu4/289/base -> origin/gh/ydwu4/289/base 2025-08-14T21:18:40.8354820Z 5635e90ad37..9faad049c4e gh/ydwu4/289/head -> origin/gh/ydwu4/289/head 2025-08-14T21:18:40.8357062Z + b9dc464b082...249b3f1a4cc gh/ydwu4/289/orig -> origin/gh/ydwu4/289/orig (forced update) 2025-08-14T21:18:40.8359185Z c6f4bbfe24a..56500da2a46 gh/ydwu4/290/base -> origin/gh/ydwu4/290/base 2025-08-14T21:18:40.8361531Z fb52b34d9d8..a9f4e34e95a gh/ydwu4/290/head -> origin/gh/ydwu4/290/head 2025-08-14T21:18:40.8363786Z + 4aee28d7cba...84588c9cecc gh/ydwu4/290/orig -> origin/gh/ydwu4/290/orig (forced update) 2025-08-14T21:18:40.8365899Z eafc551da2d..77fe4d9d877 gh/ydwu4/291/base -> origin/gh/ydwu4/291/base 2025-08-14T21:18:40.8368096Z 1eff640ef2a..884e299f170 gh/ydwu4/291/head -> origin/gh/ydwu4/291/head 2025-08-14T21:18:40.8370297Z + 535f628315d...561d588cae4 gh/ydwu4/291/orig -> origin/gh/ydwu4/291/orig (forced update) 2025-08-14T21:18:40.8372567Z 95767ac464a..047f5842de9 gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-08-14T21:18:40.8374687Z 855491e7d90..ca7a5d62924 gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-08-14T21:18:40.8376855Z + 281c90bb098...59233551ea9 gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig (forced update) 2025-08-14T21:18:40.8378890Z 90531c229e5..deb9cd690fc gh/ydwu4/293/base -> origin/gh/ydwu4/293/base 2025-08-14T21:18:40.8381016Z 0250adb10dc..5e052917f0f gh/ydwu4/293/head -> origin/gh/ydwu4/293/head 2025-08-14T21:18:40.8383105Z + 30b21ad5427...ffdc931dbb5 gh/ydwu4/293/orig -> origin/gh/ydwu4/293/orig (forced update) 2025-08-14T21:18:40.8385311Z 8d1de9cf2a4..37e3590e3f3 gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-08-14T21:18:40.8387681Z 3ec3a23927b..b43dcce75fd gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-08-14T21:18:40.8389888Z + b3ae71c07a2...b3c1981a98a gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig (forced update) 2025-08-14T21:18:40.8391910Z dda63ad5505..dcfb17c44e4 gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-08-14T21:18:40.8394357Z f5a3014ee02..19f4a9b32ef gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-08-14T21:18:40.8396178Z + 9b776485c6b...9ee4bfc6f03 gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig (forced update) 2025-08-14T21:18:40.8398841Z fc25c68f20f..c107eb8fa07 gh/ydwu4/301/base -> origin/gh/ydwu4/301/base 2025-08-14T21:18:40.8400908Z 2b889529039..d53b6450b00 gh/ydwu4/301/head -> origin/gh/ydwu4/301/head 2025-08-14T21:18:40.8403296Z + 3e899b2ce51...1512814970a gh/ydwu4/301/orig -> origin/gh/ydwu4/301/orig (forced update) 2025-08-14T21:18:40.8405797Z 2b889529039..b0a7fdb5401 gh/ydwu4/305/base -> origin/gh/ydwu4/305/base 2025-08-14T21:18:40.8407924Z 7eeafe602af..538a08b69f2 gh/ydwu4/305/head -> origin/gh/ydwu4/305/head 2025-08-14T21:18:40.8410311Z + 4d734258b6d...48e3105c227 gh/ydwu4/305/orig -> origin/gh/ydwu4/305/orig (forced update) 2025-08-14T21:18:40.8412318Z 867bd790e14..e8622caa50f gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-08-14T21:18:40.8414439Z 9973dd5f806..14d2597c77a gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-08-14T21:18:40.8416700Z + 1d368e3911c...66680d2c273 gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig (forced update) 2025-08-14T21:18:40.8418738Z 9973dd5f806..a42184b16d8 gh/ydwu4/307/base -> origin/gh/ydwu4/307/base 2025-08-14T21:18:40.8420796Z 9fa0484136c..a48d6f473d9 gh/ydwu4/307/head -> origin/gh/ydwu4/307/head 2025-08-14T21:18:40.8423046Z + 7e922ff99d8...dd60840ccfd gh/ydwu4/307/orig -> origin/gh/ydwu4/307/orig (forced update) 2025-08-14T21:18:40.8425198Z 12913a9a722..39133df24c2 gh/ydwu4/308/head -> origin/gh/ydwu4/308/head 2025-08-14T21:18:40.8427274Z + cee669623d0...279de9fc5c6 gh/ydwu4/308/orig -> origin/gh/ydwu4/308/orig (forced update) 2025-08-14T21:18:40.8430004Z * [new branch] gh/ydwu4/309/base -> origin/gh/ydwu4/309/base 2025-08-14T21:18:40.8431454Z * [new branch] gh/ydwu4/309/head -> origin/gh/ydwu4/309/head 2025-08-14T21:18:40.8433079Z * [new branch] gh/ydwu4/309/orig -> origin/gh/ydwu4/309/orig 2025-08-14T21:18:40.8435726Z * [new branch] gh/ydwu4/310/base -> origin/gh/ydwu4/310/base 2025-08-14T21:18:40.8437217Z * [new branch] gh/ydwu4/310/head -> origin/gh/ydwu4/310/head 2025-08-14T21:18:40.8438806Z * [new branch] gh/ydwu4/310/orig -> origin/gh/ydwu4/310/orig 2025-08-14T21:18:40.8441526Z * [new branch] gh/ydwu4/311/base -> origin/gh/ydwu4/311/base 2025-08-14T21:18:40.8443054Z * [new branch] gh/ydwu4/311/head -> origin/gh/ydwu4/311/head 2025-08-14T21:18:40.8444633Z * [new branch] gh/ydwu4/311/orig -> origin/gh/ydwu4/311/orig 2025-08-14T21:18:40.8451354Z 0d3461bac0f..3be70dc30e8 main -> origin/main 2025-08-14T21:18:40.8455998Z + e33a33da492...8ac9fe6cbf8 mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 (forced update) 2025-08-14T21:18:40.8461672Z + fd5bdcd271d...0203223a156 update_submodule_FBGEMM -> origin/update_submodule_FBGEMM (forced update) 2025-08-14T21:18:40.8463599Z e4de93f6a3e..7e27347fd35 viable/strict -> origin/viable/strict 2025-08-14T21:18:40.8467387Z t [tag update] ciflow/h100-symm-mem/159562 -> ciflow/h100-symm-mem/159562 2025-08-14T21:18:40.8469196Z t [tag update] ciflow/inductor/153966 -> ciflow/inductor/153966 2025-08-14T21:18:40.8470685Z t [tag update] ciflow/inductor/154193 -> ciflow/inductor/154193 2025-08-14T21:18:40.8472024Z * [new tag] ciflow/inductor/156851 -> ciflow/inductor/156851 2025-08-14T21:18:40.8474005Z t [tag update] ciflow/inductor/159009 -> ciflow/inductor/159009 2025-08-14T21:18:40.8475496Z t [tag update] ciflow/inductor/159010 -> ciflow/inductor/159010 2025-08-14T21:18:40.8477550Z * [new tag] ciflow/inductor/159365 -> ciflow/inductor/159365 2025-08-14T21:18:40.8478584Z t [tag update] ciflow/inductor/159366 -> ciflow/inductor/159366 2025-08-14T21:18:40.8479825Z t [tag update] ciflow/inductor/159367 -> ciflow/inductor/159367 2025-08-14T21:18:40.8481539Z t [tag update] ciflow/inductor/159368 -> ciflow/inductor/159368 2025-08-14T21:18:40.8483084Z t [tag update] ciflow/inductor/159483 -> ciflow/inductor/159483 2025-08-14T21:18:40.8484799Z t [tag update] ciflow/inductor/159864 -> ciflow/inductor/159864 2025-08-14T21:18:40.8486329Z t [tag update] ciflow/inductor/159865 -> ciflow/inductor/159865 2025-08-14T21:18:40.8487915Z t [tag update] ciflow/inductor/159875 -> ciflow/inductor/159875 2025-08-14T21:18:40.8489394Z t [tag update] ciflow/inductor/159902 -> ciflow/inductor/159902 2025-08-14T21:18:40.8491407Z t [tag update] ciflow/inductor/160201 -> ciflow/inductor/160201 2025-08-14T21:18:40.8493351Z t [tag update] ciflow/inductor/160374 -> ciflow/inductor/160374 2025-08-14T21:18:40.8495272Z t [tag update] ciflow/inductor/160467 -> ciflow/inductor/160467 2025-08-14T21:18:40.8496850Z t [tag update] ciflow/inductor/160476 -> ciflow/inductor/160476 2025-08-14T21:18:40.8498436Z t [tag update] ciflow/inductor/160483 -> ciflow/inductor/160483 2025-08-14T21:18:40.8501120Z t [tag update] ciflow/inductor/160539 -> ciflow/inductor/160539 2025-08-14T21:18:40.8502884Z t [tag update] ciflow/inductor/160540 -> ciflow/inductor/160540 2025-08-14T21:18:40.8504399Z t [tag update] ciflow/inductor/160548 -> ciflow/inductor/160548 2025-08-14T21:18:40.8505960Z t [tag update] ciflow/inductor/160583 -> ciflow/inductor/160583 2025-08-14T21:18:40.8507750Z t [tag update] ciflow/inductor/160592 -> ciflow/inductor/160592 2025-08-14T21:18:40.8509501Z t [tag update] ciflow/inductor/160635 -> ciflow/inductor/160635 2025-08-14T21:18:40.8510789Z * [new tag] ciflow/inductor/160668 -> ciflow/inductor/160668 2025-08-14T21:18:40.8511818Z * [new tag] ciflow/inductor/160669 -> ciflow/inductor/160669 2025-08-14T21:18:40.8512919Z * [new tag] ciflow/inductor/160670 -> ciflow/inductor/160670 2025-08-14T21:18:40.8514293Z * [new tag] ciflow/inductor/160671 -> ciflow/inductor/160671 2025-08-14T21:18:40.8515644Z * [new tag] ciflow/inductor/160677 -> ciflow/inductor/160677 2025-08-14T21:18:40.8516841Z * [new tag] ciflow/inductor/160679 -> ciflow/inductor/160679 2025-08-14T21:18:40.8518813Z t [tag update] ciflow/periodic/160201 -> ciflow/periodic/160201 2025-08-14T21:18:40.8521644Z t [tag update] ciflow/trunk/154193 -> ciflow/trunk/154193 2025-08-14T21:18:40.8522616Z * [new tag] ciflow/trunk/156851 -> ciflow/trunk/156851 2025-08-14T21:18:40.8523643Z * [new tag] ciflow/trunk/157152 -> ciflow/trunk/157152 2025-08-14T21:18:40.8524765Z * [new tag] ciflow/trunk/158810 -> ciflow/trunk/158810 2025-08-14T21:18:40.8525755Z * [new tag] ciflow/trunk/158812 -> ciflow/trunk/158812 2025-08-14T21:18:40.8527314Z t [tag update] ciflow/trunk/158863 -> ciflow/trunk/158863 2025-08-14T21:18:40.8528746Z t [tag update] ciflow/trunk/158864 -> ciflow/trunk/158864 2025-08-14T21:18:40.8530176Z t [tag update] ciflow/trunk/158883 -> ciflow/trunk/158883 2025-08-14T21:18:40.8531685Z t [tag update] ciflow/trunk/158965 -> ciflow/trunk/158965 2025-08-14T21:18:40.8533197Z t [tag update] ciflow/trunk/159562 -> ciflow/trunk/159562 2025-08-14T21:18:40.8580451Z * [new tag] ciflow/trunk/159682 -> ciflow/trunk/159682 2025-08-14T21:18:40.8581467Z * [new tag] ciflow/trunk/160219 -> ciflow/trunk/160219 2025-08-14T21:18:40.8582275Z * [new tag] ciflow/trunk/160430 -> ciflow/trunk/160430 2025-08-14T21:18:40.8583045Z t [tag update] ciflow/trunk/160592 -> ciflow/trunk/160592 2025-08-14T21:18:40.8583807Z * [new tag] ciflow/trunk/160656 -> ciflow/trunk/160656 2025-08-14T21:18:40.8584547Z t [tag update] ciflow/vllm/160116 -> ciflow/vllm/160116 2025-08-14T21:18:40.8585309Z t [tag update] ciflow/vllm/160583 -> ciflow/vllm/160583 2025-08-14T21:18:40.8586518Z t [tag update] ciflow/win-arm64/159562 -> ciflow/win-arm64/159562 2025-08-14T21:18:40.8587691Z * [new tag] trunk/1028c5e2d50e121865bf98307e7c035f549a24b2 -> trunk/1028c5e2d50e121865bf98307e7c035f549a24b2 2025-08-14T21:18:40.8589153Z * [new tag] trunk/19b4283884b2d9b3a0eb364da10b1540d14ab7a7 -> trunk/19b4283884b2d9b3a0eb364da10b1540d14ab7a7 2025-08-14T21:18:40.8590568Z * [new tag] trunk/1fc683cf17c8c673044538d10266c00f92987be2 -> trunk/1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:18:40.8591961Z * [new tag] trunk/3be70dc30e893b552fc0f23ca06cd8f7949b6d08 -> trunk/3be70dc30e893b552fc0f23ca06cd8f7949b6d08 2025-08-14T21:18:40.8593371Z * [new tag] trunk/3fe19a7a0af3f4d692af30476c320be18c7e8ae6 -> trunk/3fe19a7a0af3f4d692af30476c320be18c7e8ae6 2025-08-14T21:18:40.8594781Z * [new tag] trunk/47a1db823dfcdacdb99f317428fc3791a18c5812 -> trunk/47a1db823dfcdacdb99f317428fc3791a18c5812 2025-08-14T21:18:40.8596246Z * [new tag] trunk/4a90dc0c1f68d1f98832b169f792ed1bb195a0f3 -> trunk/4a90dc0c1f68d1f98832b169f792ed1bb195a0f3 2025-08-14T21:18:40.8597711Z * [new tag] trunk/8d6d3246316e1767a57d5e855acd6208da753b75 -> trunk/8d6d3246316e1767a57d5e855acd6208da753b75 2025-08-14T21:18:40.8599104Z * [new tag] trunk/b9d7de3a094598c3dc0dd52e57bce30eb684c9d8 -> trunk/b9d7de3a094598c3dc0dd52e57bce30eb684c9d8 2025-08-14T21:18:40.8600663Z * [new tag] trunk/eac2d9d695a32dd456050f45cac35134ec3809f4 -> trunk/eac2d9d695a32dd456050f45cac35134ec3809f4 2025-08-14T21:18:40.8602102Z * [new tag] trunk/fdfd69bb05488d76123db9cc1cdd90ac4137bbfb -> trunk/fdfd69bb05488d76123db9cc1cdd90ac4137bbfb 2025-08-14T21:18:40.9466320Z [command]/usr/bin/git rev-parse --verify --quiet 1fc683cf17c8c673044538d10266c00f92987be2^{object} 2025-08-14T21:18:40.9510523Z 1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:18:40.9519741Z ##[endgroup] 2025-08-14T21:18:40.9520507Z ##[group]Determining the checkout info 2025-08-14T21:18:40.9523481Z ##[endgroup] 2025-08-14T21:18:40.9532207Z [command]/usr/bin/git sparse-checkout disable 2025-08-14T21:18:40.9832690Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-08-14T21:18:40.9878966Z ##[group]Checking out the ref 2025-08-14T21:18:40.9887853Z [command]/usr/bin/git checkout --progress --force 1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:18:41.1576048Z Previous HEAD position was 65053c03a3d [FR] Don't check incomplete ranks for printing (#160195) 2025-08-14T21:18:41.1583874Z HEAD is now at 1fc683cf17c [Inductor] Allow indexing a flexible layout for extract_input_node_reduction_ranges (#160645) 2025-08-14T21:18:41.1631024Z ##[endgroup] 2025-08-14T21:18:41.1631746Z ##[group]Setting up auth for fetching submodules 2025-08-14T21:18:41.1638372Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-08-14T21:18:41.1705177Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-08-14T21:18:41.1753356Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-08-14T21:18:41.1802087Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-08-14T21:18:41.1843647Z ##[endgroup] 2025-08-14T21:18:41.1844342Z ##[group]Fetching submodules 2025-08-14T21:18:41.1850599Z [command]/usr/bin/git submodule sync --recursive 2025-08-14T21:18:41.2270939Z Synchronizing submodule url for 'android/libs/fbjni' 2025-08-14T21:18:41.2336977Z Synchronizing submodule url for 'third_party/FP16' 2025-08-14T21:18:41.2406390Z Synchronizing submodule url for 'third_party/FXdiv' 2025-08-14T21:18:41.2477080Z Synchronizing submodule url for 'third_party/NNPACK' 2025-08-14T21:18:41.2548237Z Synchronizing submodule url for 'third_party/NVTX' 2025-08-14T21:18:41.2617239Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:41.2692808Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-08-14T21:18:41.2778159Z Synchronizing submodule url for 'third_party/aiter' 2025-08-14T21:18:41.2846860Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:41.2936826Z Synchronizing submodule url for 'third_party/benchmark' 2025-08-14T21:18:41.3006508Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-08-14T21:18:41.3084721Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-08-14T21:18:41.3152250Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-08-14T21:18:41.3230985Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-08-14T21:18:41.3296228Z Synchronizing submodule url for 'third_party/cutlass' 2025-08-14T21:18:41.3388931Z Synchronizing submodule url for 'third_party/fbgemm' 2025-08-14T21:18:41.3459612Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:41.3525390Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:41.3594864Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:41.3662345Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:41.3737866Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:41.3806506Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:41.3866997Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-08-14T21:18:41.3940572Z Synchronizing submodule url for 'third_party/flash-attention' 2025-08-14T21:18:41.4000818Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:41.4071003Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:41.4149147Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-08-14T21:18:41.4214606Z Synchronizing submodule url for 'third_party/fmt' 2025-08-14T21:18:41.4277889Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:41.4342354Z Synchronizing submodule url for 'third_party/gloo' 2025-08-14T21:18:41.4407285Z Synchronizing submodule url for 'third_party/googletest' 2025-08-14T21:18:41.4476847Z Synchronizing submodule url for 'third_party/ideep' 2025-08-14T21:18:41.4542935Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:41.4626666Z Synchronizing submodule url for 'third_party/ittapi' 2025-08-14T21:18:41.4697907Z Synchronizing submodule url for 'third_party/kineto' 2025-08-14T21:18:41.4758402Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:41.4823542Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:41.4886529Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:41.4948772Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:41.5011763Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:41.5074309Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:41.5141823Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:41.5200947Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:41.5264270Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:41.5339305Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:41.5415927Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:41.5478280Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:41.5546950Z Synchronizing submodule url for 'third_party/kleidiai' 2025-08-14T21:18:41.5618591Z Synchronizing submodule url for 'third_party/mimalloc' 2025-08-14T21:18:41.5688826Z Synchronizing submodule url for 'third_party/nlohmann' 2025-08-14T21:18:41.5755006Z Synchronizing submodule url for 'third_party/onnx' 2025-08-14T21:18:41.5837880Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:41.5914275Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-08-14T21:18:41.5974730Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:41.6040722Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:41.6104158Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:41.6161348Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:41.6225988Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:41.6292536Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:41.6353025Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:41.6411593Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:41.6473845Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:41.6542664Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:41.6638184Z Synchronizing submodule url for 'third_party/pocketfft' 2025-08-14T21:18:41.6704637Z Synchronizing submodule url for 'third_party/protobuf' 2025-08-14T21:18:41.6770634Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:41.6831707Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:41.6897533Z Synchronizing submodule url for 'third_party/psimd' 2025-08-14T21:18:41.6960233Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-08-14T21:18:41.7022510Z Synchronizing submodule url for 'third_party/pybind11' 2025-08-14T21:18:41.7092354Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-08-14T21:18:41.7160918Z Synchronizing submodule url for 'third_party/sleef' 2025-08-14T21:18:41.7225475Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-08-14T21:18:41.7293717Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:41.7351556Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:41.7411295Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:41.7470727Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:41.7525758Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:41.7627943Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-08-14T21:18:41.8378154Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-08-14T21:18:41.8709205Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-08-14T21:18:41.9087352Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-08-14T21:18:41.9493117Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-08-14T21:18:41.9927694Z Submodule path 'third_party/NVTX': checked out '2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07' 2025-08-14T21:18:42.0364907Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-08-14T21:18:42.1056454Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-08-14T21:18:42.1604819Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-08-14T21:18:42.2235444Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-08-14T21:18:42.2679553Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-08-14T21:18:42.3409230Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-08-14T21:18:42.3887119Z Submodule path 'third_party/cpp-httplib': checked out '3af7f2c16147f3fbc6e4d717032daf505dc1652c' 2025-08-14T21:18:42.4314371Z Submodule path 'third_party/cpuinfo': checked out '5e3d2445e6a84d9599bee2bf78edbb4d80865e1d' 2025-08-14T21:18:42.4784370Z Submodule path 'third_party/cudnn_frontend': checked out 'f937055efc6d414d11f4c6577e3977fe74f35fb6' 2025-08-14T21:18:42.5350828Z Submodule path 'third_party/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-08-14T21:18:42.5946715Z Submodule path 'third_party/fbgemm': checked out '21c7d30c526c0f1ad873ecc632dca6cfa8a69067' 2025-08-14T21:18:42.6311059Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-08-14T21:18:42.6809470Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out 'b1281b8b08d973a7064f864f47eeb30f3e2596e9' 2025-08-14T21:18:42.7206378Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-08-14T21:18:42.7712223Z Submodule path 'third_party/fbgemm/external/cutlass': checked out 'b40777404c174b9694a870bff5c13ce6b7f656ad' 2025-08-14T21:18:42.8119105Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-08-14T21:18:42.8498423Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out 'a4337c69fe0e2552a7b7b0669178926beeed828c' 2025-08-14T21:18:42.8968170Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-08-14T21:18:42.9463478Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-08-14T21:18:43.0077213Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-08-14T21:18:43.0535505Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-08-14T21:18:43.1037348Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-08-14T21:18:43.1421524Z Submodule path 'third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-08-14T21:18:43.1783687Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-08-14T21:18:43.2205297Z Submodule path 'third_party/gloo': checked out 'c7b7b022c124d9643957d9bd55f57ac59fce8fa2' 2025-08-14T21:18:43.2627997Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-08-14T21:18:43.3049909Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-08-14T21:18:43.3699515Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-08-14T21:18:43.4145414Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-08-14T21:18:43.4596965Z Submodule path 'third_party/kineto': checked out '5e7501833f1021ce6f618572d3baf657b6319658' 2025-08-14T21:18:43.5022010Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e' 2025-08-14T21:18:43.5418039Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-08-14T21:18:43.5805114Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-08-14T21:18:43.6224140Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-08-14T21:18:43.6615327Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-08-14T21:18:43.6986912Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-08-14T21:18:43.7355408Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-08-14T21:18:43.7741127Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850' 2025-08-14T21:18:43.8216635Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-08-14T21:18:43.8598518Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-08-14T21:18:43.9004815Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164' 2025-08-14T21:18:43.9389569Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2025-08-14T21:18:43.9837350Z Submodule path 'third_party/kleidiai': checked out 'cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7' 2025-08-14T21:18:44.0292284Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-08-14T21:18:44.0805281Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-08-14T21:18:44.1526567Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-08-14T21:18:44.2019371Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-08-14T21:18:44.2537995Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-08-14T21:18:44.2930096Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-08-14T21:18:44.3328735Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-08-14T21:18:44.3705911Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-08-14T21:18:44.4145502Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-08-14T21:18:44.4521753Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-08-14T21:18:44.4888982Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-08-14T21:18:44.5280989Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-08-14T21:18:44.5699811Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-08-14T21:18:44.6048840Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-08-14T21:18:44.6733660Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-08-14T21:18:44.7159310Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-08-14T21:18:44.7917006Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-08-14T21:18:44.8316150Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-08-14T21:18:44.8713819Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-08-14T21:18:44.9132891Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-08-14T21:18:44.9554187Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-08-14T21:18:45.0016879Z Submodule path 'third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-08-14T21:18:45.0399155Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-08-14T21:18:45.0827693Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-08-14T21:18:45.1264372Z Submodule path 'third_party/tensorpipe': checked out 'dacda0567d9f23d4bc503e1c4f84aa65f33ac38a' 2025-08-14T21:18:45.1648702Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-08-14T21:18:45.2019156Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-08-14T21:18:45.2630892Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-08-14T21:18:45.3057303Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-08-14T21:18:45.3427296Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-08-14T21:18:45.3585431Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-08-14T21:18:45.4029174Z Entering 'android/libs/fbjni' 2025-08-14T21:18:45.4098979Z Entering 'third_party/FP16' 2025-08-14T21:18:45.4172798Z Entering 'third_party/FXdiv' 2025-08-14T21:18:45.4221618Z Entering 'third_party/NNPACK' 2025-08-14T21:18:45.4268639Z Entering 'third_party/NVTX' 2025-08-14T21:18:45.4307876Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:45.4356049Z Entering 'third_party/XNNPACK' 2025-08-14T21:18:45.4427855Z Entering 'third_party/aiter' 2025-08-14T21:18:45.4479968Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:45.4558488Z Entering 'third_party/benchmark' 2025-08-14T21:18:45.4615462Z Entering 'third_party/composable_kernel' 2025-08-14T21:18:45.4692509Z Entering 'third_party/cpp-httplib' 2025-08-14T21:18:45.4740029Z Entering 'third_party/cpuinfo' 2025-08-14T21:18:45.4808187Z Entering 'third_party/cudnn_frontend' 2025-08-14T21:18:45.4860100Z Entering 'third_party/cutlass' 2025-08-14T21:18:45.4925916Z Entering 'third_party/fbgemm' 2025-08-14T21:18:45.4998274Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:45.5054794Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:45.5116715Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:45.5150886Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:45.5190553Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:45.5236530Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:45.5293245Z Entering 'third_party/fbgemm/external/json' 2025-08-14T21:18:45.5330678Z Entering 'third_party/flash-attention' 2025-08-14T21:18:45.5365735Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:45.5428550Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:45.5496893Z Entering 'third_party/flatbuffers' 2025-08-14T21:18:45.5548342Z Entering 'third_party/fmt' 2025-08-14T21:18:45.5607511Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:45.5646736Z Entering 'third_party/gloo' 2025-08-14T21:18:45.5680683Z Entering 'third_party/googletest' 2025-08-14T21:18:45.5733902Z Entering 'third_party/ideep' 2025-08-14T21:18:45.5828627Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:45.5897919Z Entering 'third_party/ittapi' 2025-08-14T21:18:45.5942177Z Entering 'third_party/kineto' 2025-08-14T21:18:45.5984204Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:45.6037840Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:45.6096691Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:45.6151616Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:45.6212825Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:45.6260548Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:45.6310444Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:45.6349716Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:45.6391291Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:45.6426105Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:45.6463397Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:45.6504392Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:45.6550169Z Entering 'third_party/kleidiai' 2025-08-14T21:18:45.6598454Z Entering 'third_party/mimalloc' 2025-08-14T21:18:45.6644771Z Entering 'third_party/nlohmann' 2025-08-14T21:18:45.6680801Z Entering 'third_party/onnx' 2025-08-14T21:18:45.6735248Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:45.6779318Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T21:18:45.6817762Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:45.6858904Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:45.6905476Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:45.6938156Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:45.6972234Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:45.7004922Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:45.7049223Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:45.7086943Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:45.7133401Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:45.7178577Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:45.7266674Z Entering 'third_party/pocketfft' 2025-08-14T21:18:45.7312150Z Entering 'third_party/protobuf' 2025-08-14T21:18:45.7373000Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:45.7410363Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:45.7448706Z Entering 'third_party/psimd' 2025-08-14T21:18:45.7492636Z Entering 'third_party/pthreadpool' 2025-08-14T21:18:45.7531887Z Entering 'third_party/pybind11' 2025-08-14T21:18:45.7565933Z Entering 'third_party/python-peachpy' 2025-08-14T21:18:45.7599841Z Entering 'third_party/sleef' 2025-08-14T21:18:45.7639434Z Entering 'third_party/tensorpipe' 2025-08-14T21:18:45.7677919Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:45.7723381Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:45.7773878Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:45.7817311Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:45.7875280Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:45.7952556Z ##[endgroup] 2025-08-14T21:18:45.7953310Z ##[group]Persisting credentials for submodules 2025-08-14T21:18:45.7965342Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-08-14T21:18:45.8356439Z Entering 'android/libs/fbjni' 2025-08-14T21:18:45.8390263Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8390908Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8431750Z Entering 'third_party/FP16' 2025-08-14T21:18:45.8455196Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8455853Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8505554Z Entering 'third_party/FXdiv' 2025-08-14T21:18:45.8526529Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8527188Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8558849Z Entering 'third_party/NNPACK' 2025-08-14T21:18:45.8579450Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8580112Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8611988Z Entering 'third_party/NVTX' 2025-08-14T21:18:45.8632143Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8632827Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8668538Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:45.8690965Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8691627Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8721031Z Entering 'third_party/XNNPACK' 2025-08-14T21:18:45.8746880Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8747523Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8796181Z Entering 'third_party/aiter' 2025-08-14T21:18:45.8823101Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8823740Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8859234Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:45.8896373Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8897041Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8939613Z Entering 'third_party/benchmark' 2025-08-14T21:18:45.8966889Z url.https://github.com/.insteadof 2025-08-14T21:18:45.8967541Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9005717Z Entering 'third_party/composable_kernel' 2025-08-14T21:18:45.9025159Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9025836Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9064352Z Entering 'third_party/cpp-httplib' 2025-08-14T21:18:45.9094637Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9095274Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9126894Z Entering 'third_party/cpuinfo' 2025-08-14T21:18:45.9147252Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9147902Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9193437Z Entering 'third_party/cudnn_frontend' 2025-08-14T21:18:45.9218710Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9219364Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9252605Z Entering 'third_party/cutlass' 2025-08-14T21:18:45.9274501Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9275186Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9332986Z Entering 'third_party/fbgemm' 2025-08-14T21:18:45.9356605Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9357245Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9416012Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:45.9447996Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9448631Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9488416Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:45.9524460Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9525106Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9578880Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:45.9613684Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9614348Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9647116Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:45.9676642Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9677303Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9738569Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:45.9776180Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9776824Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9828013Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:45.9858847Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9859508Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9912883Z Entering 'third_party/fbgemm/external/json' 2025-08-14T21:18:45.9944146Z url.https://github.com/.insteadof 2025-08-14T21:18:45.9944784Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0001618Z Entering 'third_party/flash-attention' 2025-08-14T21:18:46.0032151Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0032803Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0070733Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:46.0097392Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0098046Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0151997Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:46.0178981Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0179638Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0225189Z Entering 'third_party/flatbuffers' 2025-08-14T21:18:46.0258225Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0258877Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0306358Z Entering 'third_party/fmt' 2025-08-14T21:18:46.0338090Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0338725Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0396501Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:46.0427815Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0428453Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0460744Z Entering 'third_party/gloo' 2025-08-14T21:18:46.0481016Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0481678Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0528992Z Entering 'third_party/googletest' 2025-08-14T21:18:46.0557126Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0557781Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0612229Z Entering 'third_party/ideep' 2025-08-14T21:18:46.0638407Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0639074Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0686575Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:46.0719142Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0719800Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0774912Z Entering 'third_party/ittapi' 2025-08-14T21:18:46.0801836Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0802505Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0843481Z Entering 'third_party/kineto' 2025-08-14T21:18:46.0876882Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0877544Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0919162Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:46.0944856Z url.https://github.com/.insteadof 2025-08-14T21:18:46.0945548Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1002299Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:46.1034878Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1036021Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1080596Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:46.1112320Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1112975Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1159743Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:46.1187086Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1187754Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1227456Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:46.1246962Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1247634Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1276588Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:46.1300501Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1301161Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1339418Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:46.1360800Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1361447Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1398859Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:46.1417982Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1418650Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1452632Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:46.1473593Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1474250Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1507426Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:46.1526219Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1526886Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1564171Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:46.1583188Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1583861Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1620124Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:46.1642118Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1642758Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1678114Z Entering 'third_party/kleidiai' 2025-08-14T21:18:46.1705124Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1705792Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1744265Z Entering 'third_party/mimalloc' 2025-08-14T21:18:46.1775892Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1776552Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1822601Z Entering 'third_party/nlohmann' 2025-08-14T21:18:46.1857692Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1858330Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1906775Z Entering 'third_party/onnx' 2025-08-14T21:18:46.1927824Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1928483Z url.https://github.com/.insteadof 2025-08-14T21:18:46.1996749Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:46.2028697Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2029355Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2072323Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T21:18:46.2099502Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2100152Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2146803Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:46.2170896Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2171572Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2213169Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:46.2232319Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2233015Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2270523Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:46.2296987Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2297976Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2332590Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:46.2353794Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2354319Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2388006Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:46.2406826Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2407350Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2436809Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:46.2459757Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2460274Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2506319Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:46.2533250Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2533799Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2585370Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:46.2618149Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2618808Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2665046Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:46.2689257Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2689892Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2736799Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:46.2760247Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2761009Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2836941Z Entering 'third_party/pocketfft' 2025-08-14T21:18:46.2869165Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2869856Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2906764Z Entering 'third_party/protobuf' 2025-08-14T21:18:46.2927046Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2927702Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2961899Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:46.2982863Z url.https://github.com/.insteadof 2025-08-14T21:18:46.2983514Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3033479Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:46.3062488Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3063141Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3109277Z Entering 'third_party/psimd' 2025-08-14T21:18:46.3136958Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3137610Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3171248Z Entering 'third_party/pthreadpool' 2025-08-14T21:18:46.3191149Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3191813Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3237433Z Entering 'third_party/pybind11' 2025-08-14T21:18:46.3264863Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3265501Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3299721Z Entering 'third_party/python-peachpy' 2025-08-14T21:18:46.3325784Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3326445Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3370162Z Entering 'third_party/sleef' 2025-08-14T21:18:46.3401474Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3402128Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3455302Z Entering 'third_party/tensorpipe' 2025-08-14T21:18:46.3485289Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3485951Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3535808Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:46.3568721Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3569371Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3615348Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:46.3649318Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3649975Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3681735Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:46.3705726Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3706845Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3741798Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:46.3760953Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3761625Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3798585Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:46.3822307Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3822968Z url.https://github.com/.insteadof 2025-08-14T21:18:46.3882318Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-08-14T21:18:46.4248731Z Entering 'android/libs/fbjni' 2025-08-14T21:18:46.4306976Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-08-14T21:18:46.4328211Z Entering 'third_party/FP16' 2025-08-14T21:18:46.4366872Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-08-14T21:18:46.4386925Z Entering 'third_party/FXdiv' 2025-08-14T21:18:46.4452625Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-08-14T21:18:46.4469589Z Entering 'third_party/NNPACK' 2025-08-14T21:18:46.4505611Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-08-14T21:18:46.4526812Z Entering 'third_party/NVTX' 2025-08-14T21:18:46.4569512Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-08-14T21:18:46.4589421Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:46.4625118Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-08-14T21:18:46.4652455Z Entering 'third_party/XNNPACK' 2025-08-14T21:18:46.4705043Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-08-14T21:18:46.4749908Z Entering 'third_party/aiter' 2025-08-14T21:18:46.4796782Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-08-14T21:18:46.4826039Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:46.4866159Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-08-14T21:18:46.4911450Z Entering 'third_party/benchmark' 2025-08-14T21:18:46.4965366Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-08-14T21:18:46.4997786Z Entering 'third_party/composable_kernel' 2025-08-14T21:18:46.5041106Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-08-14T21:18:46.5067404Z Entering 'third_party/cpp-httplib' 2025-08-14T21:18:46.5104156Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-08-14T21:18:46.5123142Z Entering 'third_party/cpuinfo' 2025-08-14T21:18:46.5165512Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-08-14T21:18:46.5194942Z Entering 'third_party/cudnn_frontend' 2025-08-14T21:18:46.5255962Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-08-14T21:18:46.5286179Z Entering 'third_party/cutlass' 2025-08-14T21:18:46.5345163Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-08-14T21:18:46.5387467Z Entering 'third_party/fbgemm' 2025-08-14T21:18:46.5442916Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-08-14T21:18:46.5467113Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:46.5521736Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-08-14T21:18:46.5551593Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:46.5603896Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-08-14T21:18:46.5634626Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:46.5685218Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-08-14T21:18:46.5709449Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:46.5762127Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-08-14T21:18:46.5805490Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:46.5850535Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-08-14T21:18:46.5870190Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:46.5920577Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-08-14T21:18:46.5946776Z Entering 'third_party/fbgemm/external/json' 2025-08-14T21:18:46.5982980Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-08-14T21:18:46.6004933Z Entering 'third_party/flash-attention' 2025-08-14T21:18:46.6055308Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-08-14T21:18:46.6084321Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:46.6143792Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-08-14T21:18:46.6185304Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:46.6241692Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-08-14T21:18:46.6288248Z Entering 'third_party/flatbuffers' 2025-08-14T21:18:46.6330308Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-08-14T21:18:46.6364600Z Entering 'third_party/fmt' 2025-08-14T21:18:46.6410450Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-08-14T21:18:46.6433914Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:46.6483855Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-08-14T21:18:46.6515553Z Entering 'third_party/gloo' 2025-08-14T21:18:46.6564513Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-08-14T21:18:46.6597992Z Entering 'third_party/googletest' 2025-08-14T21:18:46.6649915Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-08-14T21:18:46.6678313Z Entering 'third_party/ideep' 2025-08-14T21:18:46.6726043Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-08-14T21:18:46.6748635Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:46.6796312Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-08-14T21:18:46.6842885Z Entering 'third_party/ittapi' 2025-08-14T21:18:46.6885055Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-08-14T21:18:46.6908942Z Entering 'third_party/kineto' 2025-08-14T21:18:46.6959518Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-08-14T21:18:46.6988700Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:46.7041931Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-08-14T21:18:46.7061987Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:46.7102303Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-08-14T21:18:46.7124643Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:46.7167751Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-08-14T21:18:46.7191070Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:46.7250183Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-08-14T21:18:46.7274648Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:46.7325437Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-08-14T21:18:46.7345047Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:46.7384552Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-08-14T21:18:46.7411033Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:46.7446406Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-08-14T21:18:46.7470206Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:46.7532720Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-08-14T21:18:46.7558880Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:46.7613237Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-08-14T21:18:46.7647916Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:46.7687976Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-08-14T21:18:46.7710725Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:46.7760488Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-08-14T21:18:46.7791546Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:46.7842330Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-08-14T21:18:46.7874564Z Entering 'third_party/kleidiai' 2025-08-14T21:18:46.7927118Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-08-14T21:18:46.7953432Z Entering 'third_party/mimalloc' 2025-08-14T21:18:46.8005904Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-08-14T21:18:46.8033950Z Entering 'third_party/nlohmann' 2025-08-14T21:18:46.8092705Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-08-14T21:18:46.8125739Z Entering 'third_party/onnx' 2025-08-14T21:18:46.8183268Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-08-14T21:18:46.8239932Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:46.8283994Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-08-14T21:18:46.8316273Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T21:18:46.8368899Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-08-14T21:18:46.8397725Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:46.8451363Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-08-14T21:18:46.8479790Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:46.8538102Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-08-14T21:18:46.8560833Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:46.8606784Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-08-14T21:18:46.8630673Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:46.8679523Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-08-14T21:18:46.8703232Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:46.8746593Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-08-14T21:18:46.8770572Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:46.8810005Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-08-14T21:18:46.8830691Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:46.8885640Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-08-14T21:18:46.8914946Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:46.8971085Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-08-14T21:18:46.8997327Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:46.9050776Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-08-14T21:18:46.9086998Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:46.9131433Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-08-14T21:18:46.9188645Z Entering 'third_party/pocketfft' 2025-08-14T21:18:46.9229882Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-08-14T21:18:46.9249835Z Entering 'third_party/protobuf' 2025-08-14T21:18:46.9301596Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-08-14T21:18:46.9326002Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:46.9362385Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-08-14T21:18:46.9386793Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:46.9422634Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-08-14T21:18:46.9443642Z Entering 'third_party/psimd' 2025-08-14T21:18:46.9487538Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-08-14T21:18:46.9508506Z Entering 'third_party/pthreadpool' 2025-08-14T21:18:46.9568603Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-08-14T21:18:46.9589759Z Entering 'third_party/pybind11' 2025-08-14T21:18:46.9649810Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-08-14T21:18:46.9670634Z Entering 'third_party/python-peachpy' 2025-08-14T21:18:46.9706785Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-08-14T21:18:46.9738306Z Entering 'third_party/sleef' 2025-08-14T21:18:46.9784716Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-08-14T21:18:46.9814552Z Entering 'third_party/tensorpipe' 2025-08-14T21:18:46.9856100Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-08-14T21:18:46.9887015Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:46.9925527Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-08-14T21:18:46.9945650Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:46.9984729Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-08-14T21:18:47.0011886Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:47.0061831Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-08-14T21:18:47.0089630Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:47.0140644Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-08-14T21:18:47.0160083Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:47.0212017Z file:/home/pytorchci/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-08-14T21:18:47.0658261Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-08-14T21:18:47.1051783Z Entering 'android/libs/fbjni' 2025-08-14T21:18:47.1114456Z Entering 'third_party/FP16' 2025-08-14T21:18:47.1175490Z Entering 'third_party/FXdiv' 2025-08-14T21:18:47.1229424Z Entering 'third_party/NNPACK' 2025-08-14T21:18:47.1277256Z Entering 'third_party/NVTX' 2025-08-14T21:18:47.1328246Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:47.1376768Z Entering 'third_party/XNNPACK' 2025-08-14T21:18:47.1449129Z Entering 'third_party/aiter' 2025-08-14T21:18:47.1503917Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:47.1578098Z Entering 'third_party/benchmark' 2025-08-14T21:18:47.1623393Z Entering 'third_party/composable_kernel' 2025-08-14T21:18:47.1688414Z Entering 'third_party/cpp-httplib' 2025-08-14T21:18:47.1738714Z Entering 'third_party/cpuinfo' 2025-08-14T21:18:47.1790412Z Entering 'third_party/cudnn_frontend' 2025-08-14T21:18:47.1845899Z Entering 'third_party/cutlass' 2025-08-14T21:18:47.1912385Z Entering 'third_party/fbgemm' 2025-08-14T21:18:47.1980577Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:47.2040636Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:47.2112766Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:47.2173132Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:47.2244194Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:47.2305453Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:47.2350730Z Entering 'third_party/fbgemm/external/json' 2025-08-14T21:18:47.2405277Z Entering 'third_party/flash-attention' 2025-08-14T21:18:47.2451767Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:47.2502883Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:47.2569257Z Entering 'third_party/flatbuffers' 2025-08-14T21:18:47.2620998Z Entering 'third_party/fmt' 2025-08-14T21:18:47.2663596Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:47.2715450Z Entering 'third_party/gloo' 2025-08-14T21:18:47.2778907Z Entering 'third_party/googletest' 2025-08-14T21:18:47.2835577Z Entering 'third_party/ideep' 2025-08-14T21:18:47.2886417Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:47.2945646Z Entering 'third_party/ittapi' 2025-08-14T21:18:47.2994406Z Entering 'third_party/kineto' 2025-08-14T21:18:47.3040627Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:47.3077804Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:47.3130216Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:47.3172753Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:47.3220369Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:47.3271697Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:47.3329217Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:47.3372398Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:47.3419982Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:47.3472705Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:47.3524310Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:47.3563755Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:47.3598364Z Entering 'third_party/kleidiai' 2025-08-14T21:18:47.3648547Z Entering 'third_party/mimalloc' 2025-08-14T21:18:47.3690024Z Entering 'third_party/nlohmann' 2025-08-14T21:18:47.3725663Z Entering 'third_party/onnx' 2025-08-14T21:18:47.3773787Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:47.3828568Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T21:18:47.3869699Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:47.3917264Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:47.3969129Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:47.4013958Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:47.4063944Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:47.4105855Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:47.4146074Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:47.4193413Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:47.4252884Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:47.4296427Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:47.4351687Z Entering 'third_party/pocketfft' 2025-08-14T21:18:47.4391001Z Entering 'third_party/protobuf' 2025-08-14T21:18:47.4430751Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:47.4465244Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:47.4507588Z Entering 'third_party/psimd' 2025-08-14T21:18:47.4545130Z Entering 'third_party/pthreadpool' 2025-08-14T21:18:47.4582871Z Entering 'third_party/pybind11' 2025-08-14T21:18:47.4623099Z Entering 'third_party/python-peachpy' 2025-08-14T21:18:47.4660190Z Entering 'third_party/sleef' 2025-08-14T21:18:47.4695394Z Entering 'third_party/tensorpipe' 2025-08-14T21:18:47.4731830Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:47.4770077Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:47.4802605Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:47.4835585Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:47.4867862Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:47.4944872Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-08-14T21:18:47.5337481Z Entering 'android/libs/fbjni' 2025-08-14T21:18:47.5408697Z Entering 'third_party/FP16' 2025-08-14T21:18:47.5457625Z Entering 'third_party/FXdiv' 2025-08-14T21:18:47.5506570Z Entering 'third_party/NNPACK' 2025-08-14T21:18:47.5569949Z Entering 'third_party/NVTX' 2025-08-14T21:18:47.5618261Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T21:18:47.5664636Z Entering 'third_party/XNNPACK' 2025-08-14T21:18:47.5740270Z Entering 'third_party/aiter' 2025-08-14T21:18:47.5795983Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T21:18:47.5874494Z Entering 'third_party/benchmark' 2025-08-14T21:18:47.5937617Z Entering 'third_party/composable_kernel' 2025-08-14T21:18:47.6003208Z Entering 'third_party/cpp-httplib' 2025-08-14T21:18:47.6054494Z Entering 'third_party/cpuinfo' 2025-08-14T21:18:47.6107448Z Entering 'third_party/cudnn_frontend' 2025-08-14T21:18:47.6167420Z Entering 'third_party/cutlass' 2025-08-14T21:18:47.6247007Z Entering 'third_party/fbgemm' 2025-08-14T21:18:47.6304687Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T21:18:47.6370926Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T21:18:47.6443154Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T21:18:47.6502581Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T21:18:47.6578233Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T21:18:47.6639133Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T21:18:47.6697866Z Entering 'third_party/fbgemm/external/json' 2025-08-14T21:18:47.6763987Z Entering 'third_party/flash-attention' 2025-08-14T21:18:47.6829003Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T21:18:47.6902999Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T21:18:47.6978892Z Entering 'third_party/flatbuffers' 2025-08-14T21:18:47.7028649Z Entering 'third_party/fmt' 2025-08-14T21:18:47.7078953Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T21:18:47.7145077Z Entering 'third_party/gloo' 2025-08-14T21:18:47.7200739Z Entering 'third_party/googletest' 2025-08-14T21:18:47.7251298Z Entering 'third_party/ideep' 2025-08-14T21:18:47.7298197Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T21:18:47.7377926Z Entering 'third_party/ittapi' 2025-08-14T21:18:47.7423534Z Entering 'third_party/kineto' 2025-08-14T21:18:47.7469834Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T21:18:47.7530316Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T21:18:47.7590743Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T21:18:47.7645849Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T21:18:47.7685533Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T21:18:47.7723537Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T21:18:47.7781479Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T21:18:47.7836104Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T21:18:47.7887969Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T21:18:47.7934890Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T21:18:47.7988347Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T21:18:47.8046305Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T21:18:47.8100117Z Entering 'third_party/kleidiai' 2025-08-14T21:18:47.8147680Z Entering 'third_party/mimalloc' 2025-08-14T21:18:47.8195616Z Entering 'third_party/nlohmann' 2025-08-14T21:18:47.8238273Z Entering 'third_party/onnx' 2025-08-14T21:18:47.8302891Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T21:18:47.8372057Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T21:18:47.8421095Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T21:18:47.8478728Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T21:18:47.8535224Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T21:18:47.8594901Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T21:18:47.8653754Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T21:18:47.8715109Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T21:18:47.8769699Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T21:18:47.8826148Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T21:18:47.8883703Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T21:18:47.8948807Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T21:18:47.9027118Z Entering 'third_party/pocketfft' 2025-08-14T21:18:47.9085702Z Entering 'third_party/protobuf' 2025-08-14T21:18:47.9149482Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T21:18:47.9209686Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T21:18:47.9268903Z Entering 'third_party/psimd' 2025-08-14T21:18:47.9313833Z Entering 'third_party/pthreadpool' 2025-08-14T21:18:47.9370746Z Entering 'third_party/pybind11' 2025-08-14T21:18:47.9421633Z Entering 'third_party/python-peachpy' 2025-08-14T21:18:47.9474859Z Entering 'third_party/sleef' 2025-08-14T21:18:47.9530578Z Entering 'third_party/tensorpipe' 2025-08-14T21:18:47.9577426Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T21:18:47.9635949Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T21:18:47.9693811Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T21:18:47.9739041Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T21:18:47.9783108Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T21:18:47.9858290Z ##[endgroup] 2025-08-14T21:18:47.9930104Z [command]/usr/bin/git log -1 --format=%H 2025-08-14T21:18:47.9976781Z 1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:18:48.0317092Z Prepare all required actions 2025-08-14T21:18:48.0318024Z Getting action download info 2025-08-14T21:18:48.2177767Z ##[group]Run ./.github/actions/setup-rocm 2025-08-14T21:18:48.2178313Z env: 2025-08-14T21:18:48.2178661Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.2179095Z ##[endgroup] 2025-08-14T21:18:48.2221074Z ##[group]Run dpkg -l | grep -E " rocm" 2025-08-14T21:18:48.2221652Z dpkg -l | grep -E " rocm" 2025-08-14T21:18:48.2273925Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.2274565Z env: 2025-08-14T21:18:48.2274926Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.2275350Z ##[endgroup] 2025-08-14T21:18:48.2569401Z ii rocm 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) software stack meta package 2025-08-14T21:18:48.2570503Z ii rocm-cmake 0.13.0.60201-112~22.04 amd64 rocm-cmake built using CMake 2025-08-14T21:18:48.2572199Z ii rocm-core 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2573488Z ii rocm-dbgapi 0.76.0.60201-112~22.04 amd64 Library to provide AMD GPU debugger API 2025-08-14T21:18:48.2574831Z ii rocm-debug-agent 2.0.3.60201-112~22.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent) 2025-08-14T21:18:48.2576706Z ii rocm-developer-tools 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2578190Z ii rocm-device-libs 1.0.0.60201-112~22.04 amd64 Radeon Open Compute - device libraries 2025-08-14T21:18:48.2579210Z ii rocm-gdb 14.2.60201-112~22.04 amd64 ROCgdb 2025-08-14T21:18:48.2580272Z ii rocm-hip-libraries 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2581484Z ii rocm-hip-runtime 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2582689Z ii rocm-hip-runtime-dev 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2583845Z ii rocm-hip-sdk 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2585009Z ii rocm-language-runtime 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2586116Z ii rocm-llvm 18.0.0.24355.60201-112~22.04 amd64 ROCm core compiler 2025-08-14T21:18:48.2587172Z ii rocm-ml-libraries 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2588317Z ii rocm-ml-sdk 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2589336Z ii rocm-opencl 2.0.0.60201-112~22.04 amd64 clr built using CMake 2025-08-14T21:18:48.2590303Z ii rocm-opencl-dev 2.0.0.60201-112~22.04 amd64 clr built using CMake 2025-08-14T21:18:48.2591405Z ii rocm-opencl-icd-loader 1.2.60201-112~22.04 amd64 OpenCL-ICD-Loader built using CMake 2025-08-14T21:18:48.2592847Z ii rocm-opencl-runtime 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2596214Z ii rocm-opencl-sdk 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2597499Z ii rocm-openmp-sdk 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) OpenMP Software development Kit. 2025-08-14T21:18:48.2598677Z ii rocm-smi-lib 7.3.0.60201-112~22.04 amd64 AMD System Management libraries 2025-08-14T21:18:48.2599769Z ii rocm-utils 6.2.1.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-08-14T21:18:48.2601002Z ii rocminfo 1.0.0.60201-112~22.04 amd64 Radeon Open Compute (ROCm) Runtime rocminfo tool 2025-08-14T21:18:48.2635805Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-08-14T21:18:48.2636727Z # ignore expansion of "docker ps -q" since it could be empty 2025-08-14T21:18:48.2637421Z # shellcheck disable=SC2046 2025-08-14T21:18:48.2638287Z docker stop $(docker ps -q) || true 2025-08-14T21:18:48.2638840Z # Prune all stopped containers. 2025-08-14T21:18:48.2639370Z docker container prune -f 2025-08-14T21:18:48.2685798Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.2686424Z env: 2025-08-14T21:18:48.2686802Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.2687215Z ##[endgroup] 2025-08-14T21:18:48.3206620Z "docker stop" requires at least 1 argument. 2025-08-14T21:18:48.3207239Z See 'docker stop --help'. 2025-08-14T21:18:48.3207577Z 2025-08-14T21:18:48.3207858Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-08-14T21:18:48.3208281Z 2025-08-14T21:18:48.3208463Z Stop one or more running containers 2025-08-14T21:18:48.3441676Z Total reclaimed space: 0B 2025-08-14T21:18:48.3497017Z ##[group]Run cat /etc/os-release || true 2025-08-14T21:18:48.3497374Z cat /etc/os-release || true 2025-08-14T21:18:48.3497715Z cat /etc/apt/sources.list.d/rocm.list || true 2025-08-14T21:18:48.3498068Z cat /opt/rocm/.info/version || true 2025-08-14T21:18:48.3498330Z whoami 2025-08-14T21:18:48.3539414Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.3539754Z env: 2025-08-14T21:18:48.3539970Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.3540212Z ##[endgroup] 2025-08-14T21:18:48.3596371Z PRETTY_NAME="Ubuntu 22.04.3 LTS" 2025-08-14T21:18:48.3596823Z NAME="Ubuntu" 2025-08-14T21:18:48.3597122Z VERSION_ID="22.04" 2025-08-14T21:18:48.3597474Z VERSION="22.04.3 LTS (Jammy Jellyfish)" 2025-08-14T21:18:48.3597891Z VERSION_CODENAME=jammy 2025-08-14T21:18:48.3598216Z ID=ubuntu 2025-08-14T21:18:48.3598476Z ID_LIKE=debian 2025-08-14T21:18:48.3598831Z HOME_URL="https://www.ubuntu.com/" 2025-08-14T21:18:48.3599261Z SUPPORT_URL="https://help.ubuntu.com/" 2025-08-14T21:18:48.3599772Z BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" 2025-08-14T21:18:48.3600672Z PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" 2025-08-14T21:18:48.3601357Z UBUNTU_CODENAME=jammy 2025-08-14T21:18:48.3609100Z deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.2.1 jammy main 2025-08-14T21:18:48.3616878Z 6.2.1-112 2025-08-14T21:18:48.3629407Z pytorchci 2025-08-14T21:18:48.3658243Z ##[group]Run dpkg -l | grep -E " amdgpu" 2025-08-14T21:18:48.3658571Z dpkg -l | grep -E " amdgpu" 2025-08-14T21:18:48.3693545Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.3693907Z env: 2025-08-14T21:18:48.3694099Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.3694329Z ##[endgroup] 2025-08-14T21:18:48.3888069Z ii amdgpu-core 1:6.2.60201-2038383.22.04 all Core meta package for unified amdgpu driver. 2025-08-14T21:18:48.3889322Z ii amdgpu-dkms 1:6.8.5.60201-2038383.22.04 all amdgpu driver in DKMS format. 2025-08-14T21:18:48.3891127Z ii amdgpu-dkms-firmware 1:6.8.5.60201-2038383.22.04 all firmware blobs used by amdgpu driver in DKMS format 2025-08-14T21:18:48.3892531Z ii amdgpu-install 6.2.60201-2038383.22.04 all AMDGPU driver repository and installer 2025-08-14T21:18:48.3924828Z ##[group]Run rocm-smi 2025-08-14T21:18:48.3925062Z rocm-smi 2025-08-14T21:18:48.3963698Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.3964349Z env: 2025-08-14T21:18:48.3964732Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.3965160Z ##[endgroup] 2025-08-14T21:18:48.4876298Z 2025-08-14T21:18:48.4876502Z 2025-08-14T21:18:48.4877096Z ========================================= ROCm System Management Interface ========================================= 2025-08-14T21:18:48.4878041Z =================================================== Concise Info =================================================== 2025-08-14T21:18:48.4879125Z Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2025-08-14T21:18:48.4881329Z  (DID, GUID) (Edge) (Avg) (Mem, Compute, ID)  2025-08-14T21:18:48.4882126Z ==================================================================================================================== 2025-08-14T21:18:48.4883360Z 0 2 0x740f, 12261 43.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0% 2025-08-14T21:18:48.4884485Z 1 3 0x740f, 36740 39.0°C 41.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0% 2025-08-14T21:18:48.4885256Z ==================================================================================================================== 2025-08-14T21:18:48.4885918Z =============================================== End of ROCm SMI Log ================================================ 2025-08-14T21:18:48.5009709Z ##[group]Run rocminfo 2025-08-14T21:18:48.5009955Z rocminfo 2025-08-14T21:18:48.5057819Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.5058100Z env: 2025-08-14T21:18:48.5058267Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.5058447Z ##[endgroup] 2025-08-14T21:18:48.6073535Z ROCk module version 6.8.5 is loaded 2025-08-14T21:18:48.6074261Z ===================== 2025-08-14T21:18:48.6074816Z HSA System Attributes 2025-08-14T21:18:48.6075254Z ===================== 2025-08-14T21:18:48.6075711Z Runtime Version: 1.14 2025-08-14T21:18:48.6076183Z Runtime Ext Version: 1.6 2025-08-14T21:18:48.6076693Z System Timestamp Freq.: 1000.000000MHz 2025-08-14T21:18:48.6077506Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-08-14T21:18:48.6078444Z Machine Model: LARGE 2025-08-14T21:18:48.6079160Z System Endianness: LITTLE 2025-08-14T21:18:48.6079785Z Mwaitx: DISABLED 2025-08-14T21:18:48.6080732Z DMAbuf Support: YES 2025-08-14T21:18:48.6081053Z 2025-08-14T21:18:48.6081209Z ========== 2025-08-14T21:18:48.6081638Z HSA Agents 2025-08-14T21:18:48.6082018Z ========== 2025-08-14T21:18:48.6082409Z ******* 2025-08-14T21:18:48.6082798Z Agent 1 2025-08-14T21:18:48.6083177Z ******* 2025-08-14T21:18:48.6083661Z Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:18:48.6084293Z Uuid: CPU-XX 2025-08-14T21:18:48.6084928Z Marketing Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:18:48.6085588Z Vendor Name: CPU 2025-08-14T21:18:48.6086207Z Feature: None specified 2025-08-14T21:18:48.6086820Z Profile: FULL_PROFILE 2025-08-14T21:18:48.6087940Z Float Round Mode: NEAR 2025-08-14T21:18:48.6088645Z Max Queue Number: 0(0x0) 2025-08-14T21:18:48.6089268Z Queue Min Size: 0(0x0) 2025-08-14T21:18:48.6089883Z Queue Max Size: 0(0x0) 2025-08-14T21:18:48.6090489Z Queue Type: MULTI 2025-08-14T21:18:48.6091064Z Node: 0 2025-08-14T21:18:48.6091637Z Device Type: CPU 2025-08-14T21:18:48.6092182Z Cache Info: 2025-08-14T21:18:48.6092631Z L1: 32768(0x8000) KB 2025-08-14T21:18:48.6093178Z Chip ID: 0(0x0) 2025-08-14T21:18:48.6093770Z ASIC Revision: 0(0x0) 2025-08-14T21:18:48.6094389Z Cacheline Size: 64(0x40) 2025-08-14T21:18:48.6095370Z Max Clock Freq. (MHz): 2600 2025-08-14T21:18:48.6095960Z BDFID: 0 2025-08-14T21:18:48.6096558Z Internal Node ID: 0 2025-08-14T21:18:48.6097177Z Compute Unit: 32 2025-08-14T21:18:48.6097777Z SIMDs per CU: 0 2025-08-14T21:18:48.6098384Z Shader Engines: 0 2025-08-14T21:18:48.6099007Z Shader Arrs. per Eng.: 0 2025-08-14T21:18:48.6099684Z WatchPts on Addr. Ranges:1 2025-08-14T21:18:48.6100279Z Memory Properties: 2025-08-14T21:18:48.6100725Z Features: None 2025-08-14T21:18:48.6101150Z Pool Info: 2025-08-14T21:18:48.6101558Z Pool 1 2025-08-14T21:18:48.6102191Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-08-14T21:18:48.6102947Z Size: 65790668(0x3ebe2cc) KB 2025-08-14T21:18:48.6103699Z Allocatable: TRUE 2025-08-14T21:18:48.6104445Z Alloc Granule: 4KB 2025-08-14T21:18:48.6105194Z Alloc Recommended Granule:4KB 2025-08-14T21:18:48.6105876Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6106546Z Accessible by all: TRUE 2025-08-14T21:18:48.6107114Z Pool 2 2025-08-14T21:18:48.6107624Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-08-14T21:18:48.6108237Z Size: 65790668(0x3ebe2cc) KB 2025-08-14T21:18:48.6108831Z Allocatable: TRUE 2025-08-14T21:18:48.6109473Z Alloc Granule: 4KB 2025-08-14T21:18:48.6110141Z Alloc Recommended Granule:4KB 2025-08-14T21:18:48.6110830Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6111473Z Accessible by all: TRUE 2025-08-14T21:18:48.6112033Z Pool 3 2025-08-14T21:18:48.6112544Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:18:48.6113141Z Size: 65790668(0x3ebe2cc) KB 2025-08-14T21:18:48.6113751Z Allocatable: TRUE 2025-08-14T21:18:48.6114379Z Alloc Granule: 4KB 2025-08-14T21:18:48.6115056Z Alloc Recommended Granule:4KB 2025-08-14T21:18:48.6115722Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6116662Z Accessible by all: TRUE 2025-08-14T21:18:48.6117279Z ISA Info: 2025-08-14T21:18:48.6117677Z ******* 2025-08-14T21:18:48.6118087Z Agent 2 2025-08-14T21:18:48.6118465Z ******* 2025-08-14T21:18:48.6118936Z Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:18:48.6119523Z Uuid: CPU-XX 2025-08-14T21:18:48.6120266Z Marketing Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:18:48.6121048Z Vendor Name: CPU 2025-08-14T21:18:48.6121693Z Feature: None specified 2025-08-14T21:18:48.6122307Z Profile: FULL_PROFILE 2025-08-14T21:18:48.6122941Z Float Round Mode: NEAR 2025-08-14T21:18:48.6123577Z Max Queue Number: 0(0x0) 2025-08-14T21:18:48.6124518Z Queue Min Size: 0(0x0) 2025-08-14T21:18:48.6125124Z Queue Max Size: 0(0x0) 2025-08-14T21:18:48.6125735Z Queue Type: MULTI 2025-08-14T21:18:48.6126311Z Node: 1 2025-08-14T21:18:48.6126887Z Device Type: CPU 2025-08-14T21:18:48.6127438Z Cache Info: 2025-08-14T21:18:48.6127876Z L1: 32768(0x8000) KB 2025-08-14T21:18:48.6128435Z Chip ID: 0(0x0) 2025-08-14T21:18:48.6129027Z ASIC Revision: 0(0x0) 2025-08-14T21:18:48.6129660Z Cacheline Size: 64(0x40) 2025-08-14T21:18:48.6130306Z Max Clock Freq. (MHz): 2600 2025-08-14T21:18:48.6130900Z BDFID: 0 2025-08-14T21:18:48.6131510Z Internal Node ID: 1 2025-08-14T21:18:48.6132240Z Compute Unit: 32 2025-08-14T21:18:48.6132963Z SIMDs per CU: 0 2025-08-14T21:18:48.6133681Z Shader Engines: 0 2025-08-14T21:18:48.6134448Z Shader Arrs. per Eng.: 0 2025-08-14T21:18:48.6135240Z WatchPts on Addr. Ranges:1 2025-08-14T21:18:48.6135919Z Memory Properties: 2025-08-14T21:18:48.6136444Z Features: None 2025-08-14T21:18:48.6136937Z Pool Info: 2025-08-14T21:18:48.6137429Z Pool 1 2025-08-14T21:18:48.6138016Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-08-14T21:18:48.6138643Z Size: 66046468(0x3efca04) KB 2025-08-14T21:18:48.6139277Z Allocatable: TRUE 2025-08-14T21:18:48.6139927Z Alloc Granule: 4KB 2025-08-14T21:18:48.6140612Z Alloc Recommended Granule:4KB 2025-08-14T21:18:48.6141292Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6141963Z Accessible by all: TRUE 2025-08-14T21:18:48.6142525Z Pool 2 2025-08-14T21:18:48.6143050Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-08-14T21:18:48.6143777Z Size: 66046468(0x3efca04) KB 2025-08-14T21:18:48.6144506Z Allocatable: TRUE 2025-08-14T21:18:48.6145272Z Alloc Granule: 4KB 2025-08-14T21:18:48.6146061Z Alloc Recommended Granule:4KB 2025-08-14T21:18:48.6147111Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6147779Z Accessible by all: TRUE 2025-08-14T21:18:48.6148353Z Pool 3 2025-08-14T21:18:48.6148848Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:18:48.6149461Z Size: 66046468(0x3efca04) KB 2025-08-14T21:18:48.6150130Z Allocatable: TRUE 2025-08-14T21:18:48.6150789Z Alloc Granule: 4KB 2025-08-14T21:18:48.6151460Z Alloc Recommended Granule:4KB 2025-08-14T21:18:48.6152137Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6152795Z Accessible by all: TRUE 2025-08-14T21:18:48.6153368Z ISA Info: 2025-08-14T21:18:48.6153762Z ******* 2025-08-14T21:18:48.6154401Z Agent 3 2025-08-14T21:18:48.6154795Z ******* 2025-08-14T21:18:48.6155220Z Name: gfx90a 2025-08-14T21:18:48.6155808Z Uuid: GPU-57cd99ed4cd4643e 2025-08-14T21:18:48.6156436Z Marketing Name: AMD Instinct MI210 2025-08-14T21:18:48.6157086Z Vendor Name: AMD 2025-08-14T21:18:48.6157697Z Feature: KERNEL_DISPATCH 2025-08-14T21:18:48.6158327Z Profile: BASE_PROFILE 2025-08-14T21:18:48.6158952Z Float Round Mode: NEAR 2025-08-14T21:18:48.6159603Z Max Queue Number: 128(0x80) 2025-08-14T21:18:48.6160239Z Queue Min Size: 64(0x40) 2025-08-14T21:18:48.6160953Z Queue Max Size: 131072(0x20000) 2025-08-14T21:18:48.6161592Z Queue Type: MULTI 2025-08-14T21:18:48.6162162Z Node: 2 2025-08-14T21:18:48.6162759Z Device Type: GPU 2025-08-14T21:18:48.6163296Z Cache Info: 2025-08-14T21:18:48.6163751Z L1: 16(0x10) KB 2025-08-14T21:18:48.6164285Z L2: 8192(0x2000) KB 2025-08-14T21:18:48.6164831Z Chip ID: 29711(0x740f) 2025-08-14T21:18:48.6165438Z ASIC Revision: 1(0x1) 2025-08-14T21:18:48.6166062Z Cacheline Size: 64(0x40) 2025-08-14T21:18:48.6166708Z Max Clock Freq. (MHz): 1700 2025-08-14T21:18:48.6167295Z BDFID: 768 2025-08-14T21:18:48.6167914Z Internal Node ID: 2 2025-08-14T21:18:48.6168551Z Compute Unit: 104 2025-08-14T21:18:48.6169154Z SIMDs per CU: 4 2025-08-14T21:18:48.6169778Z Shader Engines: 8 2025-08-14T21:18:48.6170419Z Shader Arrs. per Eng.: 1 2025-08-14T21:18:48.6171087Z WatchPts on Addr. Ranges:4 2025-08-14T21:18:48.6171784Z Coherent Host Access: FALSE 2025-08-14T21:18:48.6172475Z Memory Properties: 2025-08-14T21:18:48.6173003Z Features: KERNEL_DISPATCH 2025-08-14T21:18:48.6173707Z Fast F16 Operation: TRUE 2025-08-14T21:18:48.6174483Z Wavefront Size: 64(0x40) 2025-08-14T21:18:48.6175247Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:18:48.6176362Z Workgroup Max Size per Dimension: 2025-08-14T21:18:48.6176972Z x 1024(0x400) 2025-08-14T21:18:48.6177587Z y 1024(0x400) 2025-08-14T21:18:48.6178157Z z 1024(0x400) 2025-08-14T21:18:48.6178736Z Max Waves Per CU: 32(0x20) 2025-08-14T21:18:48.6179397Z Max Work-item Per CU: 2048(0x800) 2025-08-14T21:18:48.6180029Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:18:48.6180596Z Grid Max Size per Dimension: 2025-08-14T21:18:48.6181047Z x 4294967295(0xffffffff) 2025-08-14T21:18:48.6181579Z y 4294967295(0xffffffff) 2025-08-14T21:18:48.6182094Z z 4294967295(0xffffffff) 2025-08-14T21:18:48.6182704Z Max fbarriers/Workgrp: 32 2025-08-14T21:18:48.6192155Z Packet Processor uCode:: 83 2025-08-14T21:18:48.6192940Z SDMA engine uCode:: 8 2025-08-14T21:18:48.6193637Z IOMMU Support:: None 2025-08-14T21:18:48.6194218Z Pool Info: 2025-08-14T21:18:48.6194672Z Pool 1 2025-08-14T21:18:48.6195209Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:18:48.6195874Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:18:48.6196503Z Allocatable: TRUE 2025-08-14T21:18:48.6197141Z Alloc Granule: 4KB 2025-08-14T21:18:48.6197824Z Alloc Recommended Granule:2048KB 2025-08-14T21:18:48.6198500Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6199170Z Accessible by all: FALSE 2025-08-14T21:18:48.6199747Z Pool 2 2025-08-14T21:18:48.6200281Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-08-14T21:18:48.6201068Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:18:48.6201700Z Allocatable: TRUE 2025-08-14T21:18:48.6202349Z Alloc Granule: 4KB 2025-08-14T21:18:48.6203025Z Alloc Recommended Granule:2048KB 2025-08-14T21:18:48.6203730Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6204396Z Accessible by all: FALSE 2025-08-14T21:18:48.6204971Z Pool 3 2025-08-14T21:18:48.6205476Z Segment: GROUP 2025-08-14T21:18:48.6206066Z Size: 64(0x40) KB 2025-08-14T21:18:48.6206684Z Allocatable: FALSE 2025-08-14T21:18:48.6207306Z Alloc Granule: 0KB 2025-08-14T21:18:48.6207981Z Alloc Recommended Granule:0KB 2025-08-14T21:18:48.6208652Z Alloc Alignment: 0KB 2025-08-14T21:18:48.6209314Z Accessible by all: FALSE 2025-08-14T21:18:48.6209884Z ISA Info: 2025-08-14T21:18:48.6210285Z ISA 1 2025-08-14T21:18:48.6210845Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-08-14T21:18:48.6211570Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-08-14T21:18:48.6212370Z Profiles: HSA_PROFILE_BASE 2025-08-14T21:18:48.6213155Z Default Rounding Mode: NEAR 2025-08-14T21:18:48.6214674Z Default Rounding Mode: NEAR 2025-08-14T21:18:48.6215560Z Fast f16: TRUE 2025-08-14T21:18:48.6216307Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:18:48.6217040Z Workgroup Max Size per Dimension: 2025-08-14T21:18:48.6217654Z x 1024(0x400) 2025-08-14T21:18:48.6218218Z y 1024(0x400) 2025-08-14T21:18:48.6218716Z z 1024(0x400) 2025-08-14T21:18:48.6219296Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:18:48.6219877Z Grid Max Size per Dimension: 2025-08-14T21:18:48.6220348Z x 4294967295(0xffffffff) 2025-08-14T21:18:48.6220874Z y 4294967295(0xffffffff) 2025-08-14T21:18:48.6221390Z z 4294967295(0xffffffff) 2025-08-14T21:18:48.6222284Z FBarrier Max Size: 32 2025-08-14T21:18:48.6222829Z ******* 2025-08-14T21:18:48.6223282Z Agent 4 2025-08-14T21:18:48.6223731Z ******* 2025-08-14T21:18:48.6224213Z Name: gfx90a 2025-08-14T21:18:48.6224869Z Uuid: GPU-6b7c2bd8af06ff6a 2025-08-14T21:18:48.6225568Z Marketing Name: AMD Instinct MI210 2025-08-14T21:18:48.6226255Z Vendor Name: AMD 2025-08-14T21:18:48.6226826Z Feature: KERNEL_DISPATCH 2025-08-14T21:18:48.6227408Z Profile: BASE_PROFILE 2025-08-14T21:18:48.6227987Z Float Round Mode: NEAR 2025-08-14T21:18:48.6228582Z Max Queue Number: 128(0x80) 2025-08-14T21:18:48.6229170Z Queue Min Size: 64(0x40) 2025-08-14T21:18:48.6229738Z Queue Max Size: 131072(0x20000) 2025-08-14T21:18:48.6230314Z Queue Type: MULTI 2025-08-14T21:18:48.6230842Z Node: 3 2025-08-14T21:18:48.6231383Z Device Type: GPU 2025-08-14T21:18:48.6231886Z Cache Info: 2025-08-14T21:18:48.6232353Z L1: 16(0x10) KB 2025-08-14T21:18:48.6232884Z L2: 8192(0x2000) KB 2025-08-14T21:18:48.6233418Z Chip ID: 29711(0x740f) 2025-08-14T21:18:48.6234011Z ASIC Revision: 1(0x1) 2025-08-14T21:18:48.6234633Z Cacheline Size: 64(0x40) 2025-08-14T21:18:48.6235292Z Max Clock Freq. (MHz): 1700 2025-08-14T21:18:48.6235880Z BDFID: 33536 2025-08-14T21:18:48.6236473Z Internal Node ID: 3 2025-08-14T21:18:48.6237090Z Compute Unit: 104 2025-08-14T21:18:48.6237685Z SIMDs per CU: 4 2025-08-14T21:18:48.6238293Z Shader Engines: 8 2025-08-14T21:18:48.6238918Z Shader Arrs. per Eng.: 1 2025-08-14T21:18:48.6239583Z WatchPts on Addr. Ranges:4 2025-08-14T21:18:48.6240250Z Coherent Host Access: FALSE 2025-08-14T21:18:48.6240936Z Memory Properties: 2025-08-14T21:18:48.6241403Z Features: KERNEL_DISPATCH 2025-08-14T21:18:48.6241999Z Fast F16 Operation: TRUE 2025-08-14T21:18:48.6242982Z Wavefront Size: 64(0x40) 2025-08-14T21:18:48.6243636Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:18:48.6244231Z Workgroup Max Size per Dimension: 2025-08-14T21:18:48.6244721Z x 1024(0x400) 2025-08-14T21:18:48.6245234Z y 1024(0x400) 2025-08-14T21:18:48.6245728Z z 1024(0x400) 2025-08-14T21:18:48.6246291Z Max Waves Per CU: 32(0x20) 2025-08-14T21:18:48.6246923Z Max Work-item Per CU: 2048(0x800) 2025-08-14T21:18:48.6247544Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:18:48.6248105Z Grid Max Size per Dimension: 2025-08-14T21:18:48.6248545Z x 4294967295(0xffffffff) 2025-08-14T21:18:48.6249072Z y 4294967295(0xffffffff) 2025-08-14T21:18:48.6250056Z z 4294967295(0xffffffff) 2025-08-14T21:18:48.6250661Z Max fbarriers/Workgrp: 32 2025-08-14T21:18:48.6251359Z Packet Processor uCode:: 83 2025-08-14T21:18:48.6252113Z SDMA engine uCode:: 8 2025-08-14T21:18:48.6252887Z IOMMU Support:: None 2025-08-14T21:18:48.6253532Z Pool Info: 2025-08-14T21:18:48.6254018Z Pool 1 2025-08-14T21:18:48.6254634Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:18:48.6255396Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:18:48.6256123Z Allocatable: TRUE 2025-08-14T21:18:48.6256880Z Alloc Granule: 4KB 2025-08-14T21:18:48.6257676Z Alloc Recommended Granule:2048KB 2025-08-14T21:18:48.6258359Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6258980Z Accessible by all: FALSE 2025-08-14T21:18:48.6259500Z Pool 2 2025-08-14T21:18:48.6259983Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-08-14T21:18:48.6260553Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:18:48.6261115Z Allocatable: TRUE 2025-08-14T21:18:48.6261715Z Alloc Granule: 4KB 2025-08-14T21:18:48.6262331Z Alloc Recommended Granule:2048KB 2025-08-14T21:18:48.6262969Z Alloc Alignment: 4KB 2025-08-14T21:18:48.6263692Z Accessible by all: FALSE 2025-08-14T21:18:48.6264382Z Pool 3 2025-08-14T21:18:48.6264952Z Segment: GROUP 2025-08-14T21:18:48.6265626Z Size: 64(0x40) KB 2025-08-14T21:18:48.6266332Z Allocatable: FALSE 2025-08-14T21:18:48.6266967Z Alloc Granule: 0KB 2025-08-14T21:18:48.6267635Z Alloc Recommended Granule:0KB 2025-08-14T21:18:48.6268299Z Alloc Alignment: 0KB 2025-08-14T21:18:48.6268957Z Accessible by all: FALSE 2025-08-14T21:18:48.6269526Z ISA Info: 2025-08-14T21:18:48.6269917Z ISA 1 2025-08-14T21:18:48.6270444Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-08-14T21:18:48.6271133Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-08-14T21:18:48.6272089Z Profiles: HSA_PROFILE_BASE 2025-08-14T21:18:48.6272758Z Default Rounding Mode: NEAR 2025-08-14T21:18:48.6273435Z Default Rounding Mode: NEAR 2025-08-14T21:18:48.6274048Z Fast f16: TRUE 2025-08-14T21:18:48.6274666Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:18:48.6275261Z Workgroup Max Size per Dimension: 2025-08-14T21:18:48.6275786Z x 1024(0x400) 2025-08-14T21:18:48.6276317Z y 1024(0x400) 2025-08-14T21:18:48.6276821Z z 1024(0x400) 2025-08-14T21:18:48.6277388Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:18:48.6277952Z Grid Max Size per Dimension: 2025-08-14T21:18:48.6278733Z x 4294967295(0xffffffff) 2025-08-14T21:18:48.6279264Z y 4294967295(0xffffffff) 2025-08-14T21:18:48.6279781Z z 4294967295(0xffffffff) 2025-08-14T21:18:48.6280459Z FBarrier Max Size: 32 2025-08-14T21:18:48.6281013Z *** Done *** 2025-08-14T21:18:48.6317150Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-08-14T21:18:48.6317888Z ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-08-14T21:18:48.6319131Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-08-14T21:18:48.6320311Z if [[ $ngpu -eq 0 ]]; then 2025-08-14T21:18:48.6321071Z  echo "Error: Failed to detect any GPUs on the runner" 2025-08-14T21:18:48.6321685Z  echo "$msg" 2025-08-14T21:18:48.6322115Z  exit 1 2025-08-14T21:18:48.6322476Z fi 2025-08-14T21:18:48.6374389Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.6375126Z env: 2025-08-14T21:18:48.6375537Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.6376040Z ##[endgroup] 2025-08-14T21:18:48.7654695Z ##[group]Run pytorch/pytorch/.github/actions/diskspace-cleanup@main 2025-08-14T21:18:48.7655517Z with: 2025-08-14T21:18:48.7655963Z diskspace-cutoff: 70 2025-08-14T21:18:48.7656433Z env: 2025-08-14T21:18:48.7656867Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.7657333Z ##[endgroup] 2025-08-14T21:18:48.7721910Z ##[group]Run set -ex 2025-08-14T21:18:48.7722410Z set -ex 2025-08-14T21:18:48.7722796Z diskspace_cutoff=70 2025-08-14T21:18:48.7723417Z docker_root_dir=$(docker info -f '{{.DockerRootDir}}') 2025-08-14T21:18:48.7724094Z if [ ! -d "$docker_root_dir" ]; then 2025-08-14T21:18:48.7724989Z  echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check." 2025-08-14T21:18:48.7725810Z  exit 0 2025-08-14T21:18:48.7726187Z fi 2025-08-14T21:18:48.7726885Z diskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-08-14T21:18:48.7728405Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-08-14T21:18:48.7729689Z if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then 2025-08-14T21:18:48.7730307Z  docker system prune -af 2025-08-14T21:18:48.7731159Z  diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-08-14T21:18:48.7732269Z  if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then 2025-08-14T21:18:48.7733419Z  echo "Error: Available diskspace is less than $diskspace_cutoff percent. Not enough diskspace." 2025-08-14T21:18:48.7734878Z  echo "$msg" 2025-08-14T21:18:48.7735404Z  exit 1 2025-08-14T21:18:48.7735871Z  else 2025-08-14T21:18:48.7736414Z  difference=$((diskspace - diskspace_new)) 2025-08-14T21:18:48.7737189Z  echo "Diskspace saved: $difference percent" 2025-08-14T21:18:48.7737767Z  fi 2025-08-14T21:18:48.7738088Z fi 2025-08-14T21:18:48.7783651Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.7784239Z env: 2025-08-14T21:18:48.7784575Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.7784959Z ##[endgroup] 2025-08-14T21:18:48.7862066Z + diskspace_cutoff=70 2025-08-14T21:18:48.7869076Z ++ docker info -f '{{.DockerRootDir}}' 2025-08-14T21:18:48.8484911Z + docker_root_dir=/home/pytorchci/.local/share/docker 2025-08-14T21:18:48.8485591Z + '[' '!' -d /home/pytorchci/.local/share/docker ']' 2025-08-14T21:18:48.8496598Z ++ df -H --output=pcent /home/pytorchci/.local/share/docker 2025-08-14T21:18:48.8497539Z ++ sed -n 2p 2025-08-14T21:18:48.8498169Z ++ sed s/%// 2025-08-14T21:18:48.8499506Z ++ sed 's/ //' 2025-08-14T21:18:48.8527249Z + diskspace=60 2025-08-14T21:18:48.8528065Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified' 2025-08-14T21:18:48.8528952Z + [[ 60 -ge 70 ]] 2025-08-14T21:18:48.8576020Z ##[group]Run RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-08-14T21:18:48.8576503Z RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-08-14T21:18:48.8576840Z rm -rf "${RUNNER_ARTIFACT_DIR}" 2025-08-14T21:18:48.8577138Z mkdir -p "${RUNNER_ARTIFACT_DIR}" 2025-08-14T21:18:48.8577521Z echo "RUNNER_ARTIFACT_DIR=${RUNNER_ARTIFACT_DIR}" >> "${GITHUB_ENV}" 2025-08-14T21:18:48.8577887Z  2025-08-14T21:18:48.8578143Z RUNNER_TEST_RESULTS_DIR="${RUNNER_TEMP}/test-results" 2025-08-14T21:18:48.8578499Z rm -rf "${RUNNER_TEST_RESULTS_DIR}" 2025-08-14T21:18:48.8578833Z mkdir -p "${RUNNER_TEST_RESULTS_DIR}" 2025-08-14T21:18:48.8579255Z echo "RUNNER_TEST_RESULTS_DIR=${RUNNER_TEST_RESULTS_DIR}" >> "${GITHUB_ENV}" 2025-08-14T21:18:48.8579642Z  2025-08-14T21:18:48.8579843Z RUNNER_DOCS_DIR="${RUNNER_TEMP}/docs" 2025-08-14T21:18:48.8580130Z rm -rf "${RUNNER_DOCS_DIR}" 2025-08-14T21:18:48.8580389Z mkdir -p "${RUNNER_DOCS_DIR}" 2025-08-14T21:18:48.8580726Z echo "RUNNER_DOCS_DIR=${RUNNER_DOCS_DIR}" >> "${GITHUB_ENV}" 2025-08-14T21:18:48.8620497Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.8620835Z env: 2025-08-14T21:18:48.8621023Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.8621251Z ##[endgroup] 2025-08-14T21:18:48.8806184Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-08-14T21:18:48.8807128Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-08-14T21:18:48.8807964Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-08-14T21:18:48.8857966Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.8858557Z env: 2025-08-14T21:18:48.8858891Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.8859515Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:48.8860436Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:48.8861358Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:48.8861984Z ##[endgroup] 2025-08-14T21:18:48.9019463Z ##[group]Run # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-08-14T21:18:48.9020130Z # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-08-14T21:18:48.9020592Z # Add render group for container creation. 2025-08-14T21:18:48.9020966Z render_gid=`cat /etc/group | grep render | cut -d: -f3` 2025-08-14T21:18:48.9021681Z # Ensure GPU isolation if pod is part of kubernetes setup with DEVICE_FLAG. 2025-08-14T21:18:48.9022163Z if [ -f "/etc/podinfo/gha-render-devices" ]; then 2025-08-14T21:18:48.9022605Z  DEVICE_FLAG=$(cat /etc/podinfo/gha-render-devices) 2025-08-14T21:18:48.9023140Z else 2025-08-14T21:18:48.9023499Z  DEVICE_FLAG="--device /dev/dri" 2025-08-14T21:18:48.9023945Z fi 2025-08-14T21:18:48.9024650Z # The --group-add daemon and --group-add bin are needed in the Ubuntu 24.04 and Almalinux OSs respectively. 2025-08-14T21:18:48.9025689Z # This is due to the device files (/dev/kfd & /dev/dri) being owned by video group on bare metal. 2025-08-14T21:18:48.9026541Z # This video group ID maps to subgid 1 inside the docker image due to the /etc/subgid entries. 2025-08-14T21:18:48.9027418Z # The group name corresponding to group ID 1 can change depending on the OS, so both are necessary. 2025-08-14T21:18:48.9029207Z echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd $DEVICE_FLAG --group-add video --group-add $render_gid --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host" >> "${GITHUB_ENV}" 2025-08-14T21:18:48.9065995Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:48.9066469Z env: 2025-08-14T21:18:48.9066749Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.9067267Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:48.9068035Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:48.9068754Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:48.9069235Z ##[endgroup] 2025-08-14T21:18:48.9375724Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 2025-08-14T21:18:48.9376573Z with: 2025-08-14T21:18:48.9377088Z role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only 2025-08-14T21:18:48.9377671Z aws-region: us-east-1 2025-08-14T21:18:48.9378006Z role-duration-seconds: 18000 2025-08-14T21:18:48.9378387Z audience: sts.amazonaws.com 2025-08-14T21:18:48.9378714Z env: 2025-08-14T21:18:48.9378980Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:48.9379481Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:48.9380233Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:48.9380930Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:48.9382242Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:48.9383515Z ##[endgroup] 2025-08-14T21:18:49.3641651Z Assuming role with OIDC 2025-08-14T21:18:49.7455981Z Authenticated as assumedRoleId AROAUPVRELQNLLCOPFEJR:GitHubActions 2025-08-14T21:18:49.8632192Z ##[group]Run aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 2025-08-14T21:18:49.8633165Z with: 2025-08-14T21:18:49.8633581Z mask-password: true 2025-08-14T21:18:49.8634060Z registry-type: private 2025-08-14T21:18:49.8634593Z skip-logout: false 2025-08-14T21:18:49.8635018Z env: 2025-08-14T21:18:49.8635422Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:49.8636152Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:49.8637180Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:49.8638157Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:49.8639831Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:49.8641510Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:49.8642144Z AWS_REGION: us-east-1 2025-08-14T21:18:49.8643172Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:49.8643869Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:49.8654147Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:49.8654618Z ##[endgroup] 2025-08-14T21:18:50.3955063Z Logging into registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:51.1487402Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-08-14T21:18:51.1488232Z with: 2025-08-14T21:18:51.1489471Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1490863Z use-custom-docker-registry: true 2025-08-14T21:18:51.1491417Z docker-build-dir: .ci/docker 2025-08-14T21:18:51.1491930Z docker-build-script: ./build.sh 2025-08-14T21:18:51.1492446Z working-directory: . 2025-08-14T21:18:51.1493057Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:51.1494181Z force-push: false 2025-08-14T21:18:51.1494588Z env: 2025-08-14T21:18:51.1494971Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:51.1495691Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:51.1496720Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:51.1497741Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:51.1499400Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:51.1500899Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:51.1501399Z AWS_REGION: us-east-1 2025-08-14T21:18:51.1502091Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:51.1502835Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:51.1513118Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:51.1513577Z ##[endgroup] 2025-08-14T21:18:51.1550214Z ##[group]Run set -ex 2025-08-14T21:18:51.1550746Z set -ex 2025-08-14T21:18:51.1551151Z  2025-08-14T21:18:51.1551857Z # If the docker build directory or the build script doesn't exist, the action will 2025-08-14T21:18:51.1552994Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-08-14T21:18:51.1553960Z # job could then download the pre-built image as usual 2025-08-14T21:18:51.1555129Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-08-14T21:18:51.1556206Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1556779Z else 2025-08-14T21:18:51.1557248Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1558014Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1558705Z  2025-08-14T21:18:51.1559636Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-08-14T21:18:51.1560857Z  exit 0 2025-08-14T21:18:51.1561260Z fi 2025-08-14T21:18:51.1561628Z  2025-08-14T21:18:51.1562232Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-08-14T21:18:51.1563246Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-08-14T21:18:51.1564153Z  # use it as it is, but first let's extract the tag 2025-08-14T21:18:51.1564991Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-08-14T21:18:51.1565865Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1566699Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1567407Z else 2025-08-14T21:18:51.1567877Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-08-14T21:18:51.1568541Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-08-14T21:18:51.1569233Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-08-14T21:18:51.1569818Z  fi 2025-08-14T21:18:51.1570999Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-08-14T21:18:51.1572051Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1573139Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1574310Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1575030Z fi 2025-08-14T21:18:51.1628045Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:51.1628718Z env: 2025-08-14T21:18:51.1629113Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:51.1629832Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:51.1631315Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:51.1632300Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:51.1633975Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:51.1635512Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:51.1636023Z AWS_REGION: us-east-1 2025-08-14T21:18:51.1636612Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:51.1637290Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:51.1647693Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:51.1648174Z REPO_NAME: pytorch 2025-08-14T21:18:51.1649454Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1650819Z DOCKER_BUILD_DIR: .ci/docker 2025-08-14T21:18:51.1651329Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-08-14T21:18:51.1651995Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:51.1652698Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-08-14T21:18:51.1653209Z CUSTOM_TAG_PREFIX: 2025-08-14T21:18:51.1653647Z ##[endgroup] 2025-08-14T21:18:51.1732143Z + [[ -d .ci/docker ]] 2025-08-14T21:18:51.1732661Z + [[ -f .ci/docker/./build.sh ]] 2025-08-14T21:18:51.1733187Z + [[ true == \t\r\u\e ]] 2025-08-14T21:18:51.1733675Z + echo skip=false 2025-08-14T21:18:51.1735358Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-08-14T21:18:51.1747409Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1748845Z ++ awk -F '[:,]' '{print $2}' 2025-08-14T21:18:51.1791114Z + DOCKER_TAG=pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1792493Z + echo docker-tag=pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1794792Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1847471Z ##[group]Run set +e 2025-08-14T21:18:51.1847962Z set +e 2025-08-14T21:18:51.1848350Z set -x 2025-08-14T21:18:51.1848709Z  2025-08-14T21:18:51.1849068Z login() { 2025-08-14T21:18:51.1849880Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-08-14T21:18:51.1850726Z } 2025-08-14T21:18:51.1851078Z  2025-08-14T21:18:51.1851430Z retry () { 2025-08-14T21:18:51.1851870Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-08-14T21:18:51.1852413Z } 2025-08-14T21:18:51.1852753Z  2025-08-14T21:18:51.1853136Z retry login "${DOCKER_REGISTRY}" 2025-08-14T21:18:51.1853626Z  2025-08-14T21:18:51.1853989Z START_TIME=$(date +%s) 2025-08-14T21:18:51.1854476Z # Wait up to 120 minutes 2025-08-14T21:18:51.1855437Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-08-14T21:18:51.1856263Z  # Check if image already exists, if it does then skip building it 2025-08-14T21:18:51.1857051Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-08-14T21:18:51.1857640Z  exit 0 2025-08-14T21:18:51.1858028Z  fi 2025-08-14T21:18:51.1858383Z  2025-08-14T21:18:51.1859002Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-08-14T21:18:51.1860041Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-08-14T21:18:51.1861361Z  # latter, it will wait for the Docker images to become available before continuing 2025-08-14T21:18:51.1862182Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-08-14T21:18:51.1862836Z  # It's a Docker build job, let's build the image 2025-08-14T21:18:51.1863448Z  break 2025-08-14T21:18:51.1863874Z  else 2025-08-14T21:18:51.1864465Z  # It's a regular build job, wait for the image to become available 2025-08-14T21:18:51.1865177Z  sleep 300 2025-08-14T21:18:51.1865607Z  fi 2025-08-14T21:18:51.1865982Z done 2025-08-14T21:18:51.1866350Z  2025-08-14T21:18:51.1866961Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-08-14T21:18:51.1867927Z # be empty. The default action would be to continue rebuild the image 2025-08-14T21:18:51.1868785Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-08-14T21:18:51.1869563Z  # if we're on the base branch then use the parent commit 2025-08-14T21:18:51.1870242Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-08-14T21:18:51.1870784Z else 2025-08-14T21:18:51.1871323Z  # otherwise we're on a PR, so use the most recent base commit 2025-08-14T21:18:51.1872129Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-08-14T21:18:51.1872736Z fi 2025-08-14T21:18:51.1873105Z  2025-08-14T21:18:51.1873520Z if [[ -z "${MERGE_BASE}" ]]; then 2025-08-14T21:18:51.1874138Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1874702Z  2025-08-14T21:18:51.1875480Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-08-14T21:18:51.1876399Z  exit 0 2025-08-14T21:18:51.1876784Z fi 2025-08-14T21:18:51.1877152Z  2025-08-14T21:18:51.1877682Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-08-14T21:18:51.1878817Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-08-14T21:18:51.1879791Z  exit 1 2025-08-14T21:18:51.1880167Z fi 2025-08-14T21:18:51.1880650Z  2025-08-14T21:18:51.1881270Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-08-14T21:18:51.1882367Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-08-14T21:18:51.1883362Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-08-14T21:18:51.1884498Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-08-14T21:18:51.1885755Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-08-14T21:18:51.1886531Z fi 2025-08-14T21:18:51.1886906Z  2025-08-14T21:18:51.1887355Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-08-14T21:18:51.1931602Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:51.1932273Z env: 2025-08-14T21:18:51.1932675Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:51.1933726Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:51.1934775Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:51.1935722Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:51.1937321Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:51.1938751Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:51.1939229Z AWS_REGION: us-east-1 2025-08-14T21:18:51.1939778Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:51.1940713Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:51.1950364Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:51.1950839Z DOCKER_BUILD_DIR: .ci/docker 2025-08-14T21:18:51.1951413Z BASE_REVISION: 1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:18:51.1952782Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1954471Z DOCKER_TAG: pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:51.1955517Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:51.1956191Z DOCKER_PUSH: 2025-08-14T21:18:51.1956597Z ##[endgroup] 2025-08-14T21:18:51.2032102Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:51.2032908Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:51.2037676Z + aws ecr get-login-password --region us-east-1 2025-08-14T21:18:51.2039351Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:52.7327405Z WARNING! Your password will be stored unencrypted in /home/pytorchci/.docker/config.json. 2025-08-14T21:18:52.7328510Z Configure a credential helper to remove this warning. See 2025-08-14T21:18:52.7329542Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-08-14T21:18:52.7330197Z 2025-08-14T21:18:52.7336414Z Login Succeeded 2025-08-14T21:18:52.7382877Z ++ date +%s 2025-08-14T21:18:52.7402004Z + START_TIME=1755206332 2025-08-14T21:18:52.7411297Z ++ date +%s 2025-08-14T21:18:52.7450118Z + [[ 1755199132 -lt 1755206332 ]] 2025-08-14T21:18:52.7451619Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:54.2425510Z { 2025-08-14T21:18:54.2426642Z "schemaVersion": 2, 2025-08-14T21:18:54.2427495Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-08-14T21:18:54.2428338Z "config": { 2025-08-14T21:18:54.2428947Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-08-14T21:18:54.2429673Z "size": 28646, 2025-08-14T21:18:54.2430402Z "digest": "sha256:460acde6d8cbae0e961d7e3597d10525c881798e91edaf1ab0df7a07e0905a24" 2025-08-14T21:18:54.2431217Z }, 2025-08-14T21:18:54.2431615Z "layers": [ 2025-08-14T21:18:54.2431997Z { 2025-08-14T21:18:54.2432578Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2433291Z "size": 30448173, 2025-08-14T21:18:54.2434016Z "digest": "sha256:660ffc76f83b006444a5731b215acc2e35138d8be5cac8ed1ffd40f947117495" 2025-08-14T21:18:54.2434808Z }, 2025-08-14T21:18:54.2435146Z { 2025-08-14T21:18:54.2435699Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2436400Z "size": 1554, 2025-08-14T21:18:54.2437093Z "digest": "sha256:c7b4a852a45516e27a9256df90878663d770f96d271d6155d43be78cc5225eef" 2025-08-14T21:18:54.2437886Z }, 2025-08-14T21:18:54.2438203Z { 2025-08-14T21:18:54.2438752Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2439457Z "size": 313280717, 2025-08-14T21:18:54.2440189Z "digest": "sha256:c5628c7e7b33c6491651d9cce9ee596279d693cc7c3dcfa912aca94f75341cf7" 2025-08-14T21:18:54.2441440Z }, 2025-08-14T21:18:54.2442332Z { 2025-08-14T21:18:54.2442932Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2443637Z "size": 704, 2025-08-14T21:18:54.2444349Z "digest": "sha256:836ab08052e8eb2bae68e69ae086fd23a5f04a8491c320718ab47f84f03aebb1" 2025-08-14T21:18:54.2445126Z }, 2025-08-14T21:18:54.2445456Z { 2025-08-14T21:18:54.2445990Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2446665Z "size": 1218, 2025-08-14T21:18:54.2447350Z "digest": "sha256:0802c59adcde1d758d228271cc5f1bf6acbc597053ff8e8f3c1073d59a85bf9d" 2025-08-14T21:18:54.2448130Z }, 2025-08-14T21:18:54.2448790Z { 2025-08-14T21:18:54.2449336Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2450148Z "size": 485, 2025-08-14T21:18:54.2450824Z "digest": "sha256:e97311a6a967664cbe10c5027a1ec60c514caa9a1160167d8363088fd1f9fe09" 2025-08-14T21:18:54.2451587Z }, 2025-08-14T21:18:54.2451914Z { 2025-08-14T21:18:54.2452463Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2453151Z "size": 110343847, 2025-08-14T21:18:54.2453874Z "digest": "sha256:fe9f48a18b7ef6bdbd006ea4cb784b7e73e0daa2409a1a053c02bafed7993529" 2025-08-14T21:18:54.2454667Z }, 2025-08-14T21:18:54.2454985Z { 2025-08-14T21:18:54.2455525Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2456185Z "size": 4242, 2025-08-14T21:18:54.2456886Z "digest": "sha256:4d3edeebe7da3872c026bd4e61d5f63aee7f760e1c8cf0eb226e1d95e528735b" 2025-08-14T21:18:54.2457683Z }, 2025-08-14T21:18:54.2458015Z { 2025-08-14T21:18:54.2458565Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2459237Z "size": 1709, 2025-08-14T21:18:54.2459930Z "digest": "sha256:5a5cc76ada432cccf7d18e0eb79379afb95deaaa7afec482406267924d291ae4" 2025-08-14T21:18:54.2460733Z }, 2025-08-14T21:18:54.2461057Z { 2025-08-14T21:18:54.2461603Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2462290Z "size": 724, 2025-08-14T21:18:54.2462952Z "digest": "sha256:fc6b37d40530f2c5339430321eab67ae1e2e87e997587c7bc8c41504464208f9" 2025-08-14T21:18:54.2463719Z }, 2025-08-14T21:18:54.2464038Z { 2025-08-14T21:18:54.2464572Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2465267Z "size": 3244135942, 2025-08-14T21:18:54.2465995Z "digest": "sha256:ee623dde94b52da337d3cc0e356115ccbc33b9b5ec03f9e06fce9f05ba3572ad" 2025-08-14T21:18:54.2466786Z }, 2025-08-14T21:18:54.2467109Z { 2025-08-14T21:18:54.2467654Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2468352Z "size": 380, 2025-08-14T21:18:54.2469023Z "digest": "sha256:d6226eb61f823984003d5ac28f4d66fec9b27baf5d54a9513286483f5912cd88" 2025-08-14T21:18:54.2469778Z }, 2025-08-14T21:18:54.2470105Z { 2025-08-14T21:18:54.2470664Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2471375Z "size": 234112, 2025-08-14T21:18:54.2472088Z "digest": "sha256:6de917b18d3a540adb3b7e446e9802f89a69d39f804f40d25751639b6cea7c33" 2025-08-14T21:18:54.2472882Z }, 2025-08-14T21:18:54.2473210Z { 2025-08-14T21:18:54.2473772Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2474459Z "size": 793, 2025-08-14T21:18:54.2475154Z "digest": "sha256:76a69b57b6837bef07dbc1b481cf28a62dfd7c7063219d9f6e0d0d63067653c7" 2025-08-14T21:18:54.2475960Z }, 2025-08-14T21:18:54.2476279Z { 2025-08-14T21:18:54.2476823Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2477787Z "size": 106, 2025-08-14T21:18:54.2478474Z "digest": "sha256:451d7a75db7137d6ca0ccb4e3635d4a16f3287db4cc2a3a4d3dbc2d3b0266f06" 2025-08-14T21:18:54.2479278Z }, 2025-08-14T21:18:54.2479609Z { 2025-08-14T21:18:54.2480156Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2480969Z "size": 1496, 2025-08-14T21:18:54.2482031Z "digest": "sha256:acf5babd87f23aa905883eb434073e9a00ff41679134f2f4827dd86949f5a9d9" 2025-08-14T21:18:54.2482825Z }, 2025-08-14T21:18:54.2483155Z { 2025-08-14T21:18:54.2483694Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2484376Z "size": 453553322, 2025-08-14T21:18:54.2485107Z "digest": "sha256:06e2fbb6cb2eeccd028bfaabb2e674bf25c9d84d3fa91c9e2f03bcf765f92599" 2025-08-14T21:18:54.2485894Z }, 2025-08-14T21:18:54.2486209Z { 2025-08-14T21:18:54.2486735Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2487413Z "size": 163, 2025-08-14T21:18:54.2488378Z "digest": "sha256:87ddced081ff535d58b842d961844bf1b5979ddc8ad7f2d31f606cd6c73ec758" 2025-08-14T21:18:54.2489152Z }, 2025-08-14T21:18:54.2489472Z { 2025-08-14T21:18:54.2490004Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2490687Z "size": 2484, 2025-08-14T21:18:54.2491372Z "digest": "sha256:1eb879bde5e5b9ce59b91eacb794a63e1180a55615094477886cba5016e993d7" 2025-08-14T21:18:54.2492137Z }, 2025-08-14T21:18:54.2492447Z { 2025-08-14T21:18:54.2492989Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2493675Z "size": 8102124021, 2025-08-14T21:18:54.2494386Z "digest": "sha256:c3717fc4dc94ce7fdd93c87843ab97f72e05c0e87203ae15e48b87935b189a27" 2025-08-14T21:18:54.2495161Z }, 2025-08-14T21:18:54.2495477Z { 2025-08-14T21:18:54.2496006Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2496685Z "size": 105, 2025-08-14T21:18:54.2497343Z "digest": "sha256:0c9ed178fd2ebc5a6630c12887a96b3f9884b85144d726435e77d33ab1ee11ab" 2025-08-14T21:18:54.2498118Z }, 2025-08-14T21:18:54.2498423Z { 2025-08-14T21:18:54.2498947Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2499630Z "size": 612, 2025-08-14T21:18:54.2500312Z "digest": "sha256:b4626f6a341ca316ed5525f9c12e667b82f444b8cf673a0ad0bf8b78ccfd5bc9" 2025-08-14T21:18:54.2501089Z }, 2025-08-14T21:18:54.2501410Z { 2025-08-14T21:18:54.2501935Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2502616Z "size": 677677354, 2025-08-14T21:18:54.2503315Z "digest": "sha256:4bf07717a6a2b5fd5a05d41b16dedebb413da5d50ae09457e112d5b42ecacbe1" 2025-08-14T21:18:54.2504086Z }, 2025-08-14T21:18:54.2504421Z { 2025-08-14T21:18:54.2504947Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2505637Z "size": 111, 2025-08-14T21:18:54.2506297Z "digest": "sha256:f258db683e793ef4f283958f771de6376461ec4c2d2ac2947f2ad72a998f45ad" 2025-08-14T21:18:54.2507074Z }, 2025-08-14T21:18:54.2507398Z { 2025-08-14T21:18:54.2507926Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2508607Z "size": 1555, 2025-08-14T21:18:54.2509294Z "digest": "sha256:2d70f6b6782f0d7c3342f35a89c20ca75e63acc6cffff00c2f5b0c4ae16aa20d" 2025-08-14T21:18:54.2510088Z }, 2025-08-14T21:18:54.2510456Z { 2025-08-14T21:18:54.2511004Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2511682Z "size": 107, 2025-08-14T21:18:54.2512373Z "digest": "sha256:cfbe9750bae7f94f985be33742a90ac52adfdbe5870aa09b09ad6ce80db6541b" 2025-08-14T21:18:54.2513153Z }, 2025-08-14T21:18:54.2513470Z { 2025-08-14T21:18:54.2514001Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2514670Z "size": 166, 2025-08-14T21:18:54.2515339Z "digest": "sha256:8d2ff9ef7f4c24e23debf7d02ee44f5acc6e3ed95168f970dfa061e270556036" 2025-08-14T21:18:54.2516141Z }, 2025-08-14T21:18:54.2516458Z { 2025-08-14T21:18:54.2516983Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2517670Z "size": 2838566, 2025-08-14T21:18:54.2518379Z "digest": "sha256:59a73147541eccf1318db0147d0e40952ff983b3515bbc1b2586caa5d370c2e5" 2025-08-14T21:18:54.2519143Z }, 2025-08-14T21:18:54.2521904Z { 2025-08-14T21:18:54.2522523Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2523239Z "size": 107, 2025-08-14T21:18:54.2523920Z "digest": "sha256:80e89acaf7bc105eb24090a7638ae51ec2ec270cf4e181e3002f3ef6cea7d2dc" 2025-08-14T21:18:54.2524696Z }, 2025-08-14T21:18:54.2525011Z { 2025-08-14T21:18:54.2525523Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2526201Z "size": 802, 2025-08-14T21:18:54.2526872Z "digest": "sha256:c9c2b424b8e08d943dc259a3796d66eede3a1e93a6460df5db132c0036d3d6af" 2025-08-14T21:18:54.2527644Z }, 2025-08-14T21:18:54.2528261Z { 2025-08-14T21:18:54.2528781Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2529456Z "size": 26113684, 2025-08-14T21:18:54.2530121Z "digest": "sha256:26060b8c841ee0f2c22485c0d51390219230b08ce81ce14c500e40be58488ac5" 2025-08-14T21:18:54.2530853Z }, 2025-08-14T21:18:54.2531160Z { 2025-08-14T21:18:54.2531694Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2532353Z "size": 104, 2025-08-14T21:18:54.2533047Z "digest": "sha256:58b4bcac07b333b658450cb576e2bb832b11cd699d76f1f3846e739deb4a5738" 2025-08-14T21:18:54.2533807Z }, 2025-08-14T21:18:54.2534112Z { 2025-08-14T21:18:54.2534642Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2535314Z "size": 425, 2025-08-14T21:18:54.2535973Z "digest": "sha256:18c4ed1ec491095788e352ae018afd84de0f251fbcfb8f74d5d893e1e9ab196d" 2025-08-14T21:18:54.2536740Z }, 2025-08-14T21:18:54.2537051Z { 2025-08-14T21:18:54.2537583Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2538258Z "size": 19308715, 2025-08-14T21:18:54.2538937Z "digest": "sha256:4d7414e539c3ad1ab34e46032ca390dae2d25805222308bb02ecc1b35713b1a2" 2025-08-14T21:18:54.2539704Z }, 2025-08-14T21:18:54.2540019Z { 2025-08-14T21:18:54.2540537Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2541205Z "size": 691, 2025-08-14T21:18:54.2541862Z "digest": "sha256:6738ba83282e002d92bff3d2b4951e3c1a67f5ec2c1bad2fd780c2f5d444748f" 2025-08-14T21:18:54.2542620Z }, 2025-08-14T21:18:54.2542939Z { 2025-08-14T21:18:54.2543459Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2544123Z "size": 724, 2025-08-14T21:18:54.2544766Z "digest": "sha256:fc6b37d40530f2c5339430321eab67ae1e2e87e997587c7bc8c41504464208f9" 2025-08-14T21:18:54.2545527Z }, 2025-08-14T21:18:54.2545840Z { 2025-08-14T21:18:54.2546352Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2547026Z "size": 116, 2025-08-14T21:18:54.2547682Z "digest": "sha256:dfb0f24886393e1d394f1f433dc9346026679dafd7a60c3a93de17d94078c1ca" 2025-08-14T21:18:54.2548438Z }, 2025-08-14T21:18:54.2548749Z { 2025-08-14T21:18:54.2549261Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2549935Z "size": 136, 2025-08-14T21:18:54.2550597Z "digest": "sha256:dc833b0762f2e144670a660f6b7ce62cec71a5fdd24df4e67b5c6173d5834451" 2025-08-14T21:18:54.2551358Z }, 2025-08-14T21:18:54.2551668Z { 2025-08-14T21:18:54.2552190Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2552846Z "size": 139, 2025-08-14T21:18:54.2553530Z "digest": "sha256:8827df8ca2da347e0032d1bff3b0312437f711c5d0b5f2164f8a60c3368a9827" 2025-08-14T21:18:54.2554288Z }, 2025-08-14T21:18:54.2554601Z { 2025-08-14T21:18:54.2555127Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2555804Z "size": 32, 2025-08-14T21:18:54.2556469Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-08-14T21:18:54.2557244Z }, 2025-08-14T21:18:54.2557556Z { 2025-08-14T21:18:54.2558092Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2558768Z "size": 214, 2025-08-14T21:18:54.2559710Z "digest": "sha256:cd36d0dbfc741c82fee3a71ab009d02c22329a76463482a521b51f4e183248b3" 2025-08-14T21:18:54.2560648Z }, 2025-08-14T21:18:54.2560962Z { 2025-08-14T21:18:54.2561505Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2562193Z "size": 347, 2025-08-14T21:18:54.2562853Z "digest": "sha256:4ba8e7a736c8199931fd7ff9931a5f17b7b931d0383a3e158f1b12b191a1d250" 2025-08-14T21:18:54.2563631Z }, 2025-08-14T21:18:54.2563934Z { 2025-08-14T21:18:54.2564472Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2565153Z "size": 88301, 2025-08-14T21:18:54.2566237Z "digest": "sha256:d3f2353a1f26b127b1869d0eb92a134748c89b45f5defec4fcb2c356bc73c540" 2025-08-14T21:18:54.2567015Z }, 2025-08-14T21:18:54.2567328Z { 2025-08-14T21:18:54.2567873Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2568563Z "size": 106, 2025-08-14T21:18:54.2569240Z "digest": "sha256:22c681dc84cd6e1f55b9874ead0fb9d31d471d5cb641391a3b3dd979c1502cc6" 2025-08-14T21:18:54.2570028Z }, 2025-08-14T21:18:54.2570351Z { 2025-08-14T21:18:54.2570884Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2571565Z "size": 1666, 2025-08-14T21:18:54.2572217Z "digest": "sha256:519f3067ec6a99456e6fb280bd4b79e2478677684831299d601c9493523b67cc" 2025-08-14T21:18:54.2572961Z }, 2025-08-14T21:18:54.2573260Z { 2025-08-14T21:18:54.2573780Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2574454Z "size": 724, 2025-08-14T21:18:54.2575105Z "digest": "sha256:fc6b37d40530f2c5339430321eab67ae1e2e87e997587c7bc8c41504464208f9" 2025-08-14T21:18:54.2575869Z }, 2025-08-14T21:18:54.2576175Z { 2025-08-14T21:18:54.2576693Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2577368Z "size": 139, 2025-08-14T21:18:54.2578040Z "digest": "sha256:6a8272f38d227d78a8f9b1ddf10bb73941fbd77e71ae3c582c27e08c0f6261c2" 2025-08-14T21:18:54.2578808Z }, 2025-08-14T21:18:54.2579113Z { 2025-08-14T21:18:54.2579640Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2580310Z "size": 120, 2025-08-14T21:18:54.2580938Z "digest": "sha256:973e2484504fec2c1292e24122716507e8ca4717a35c142358174b60c0e401cf" 2025-08-14T21:18:54.2581670Z }, 2025-08-14T21:18:54.2581972Z { 2025-08-14T21:18:54.2582489Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2583166Z "size": 5373253071, 2025-08-14T21:18:54.2583865Z "digest": "sha256:ac32a18ead25409bbcd67a8636c32d8af5852a347ea5503c7c9c6234cacf45a1" 2025-08-14T21:18:54.2584645Z }, 2025-08-14T21:18:54.2584951Z { 2025-08-14T21:18:54.2585465Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2586138Z "size": 175, 2025-08-14T21:18:54.2586856Z "digest": "sha256:76846613d7067ca9de6a73db198902ca46b403bc78e2627f3f6fb805e1bbaa0a" 2025-08-14T21:18:54.2587628Z }, 2025-08-14T21:18:54.2587939Z { 2025-08-14T21:18:54.2588458Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2589132Z "size": 1896, 2025-08-14T21:18:54.2589800Z "digest": "sha256:dcf1e01c98d6a6f72674d79a4e8e4047b54796576cd06ad682c225a92820a8f5" 2025-08-14T21:18:54.2590560Z }, 2025-08-14T21:18:54.2590869Z { 2025-08-14T21:18:54.2591395Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2592058Z "size": 197509462, 2025-08-14T21:18:54.2592743Z "digest": "sha256:a88175fa027cfade948a7004100834f68828b13c0f21b314491c37084fbbe7e5" 2025-08-14T21:18:54.2593507Z }, 2025-08-14T21:18:54.2593811Z { 2025-08-14T21:18:54.2594329Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2595000Z "size": 303, 2025-08-14T21:18:54.2595645Z "digest": "sha256:7f5277f691672469f431fd90a8c2bb702c6c68333f6be2cff868f00e416c5a1a" 2025-08-14T21:18:54.2596396Z }, 2025-08-14T21:18:54.2597005Z { 2025-08-14T21:18:54.2597540Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2598206Z "size": 32, 2025-08-14T21:18:54.2598866Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-08-14T21:18:54.2599641Z }, 2025-08-14T21:18:54.2599954Z { 2025-08-14T21:18:54.2600582Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2601263Z "size": 108, 2025-08-14T21:18:54.2601927Z "digest": "sha256:d0b078cebcc48c1f46a5d1286ae68a90c295cb6b81f033ba9a2f9c2498d6d7e5" 2025-08-14T21:18:54.2602704Z }, 2025-08-14T21:18:54.2603016Z { 2025-08-14T21:18:54.2603941Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-08-14T21:18:54.2604621Z "size": 54145699, 2025-08-14T21:18:54.2605305Z "digest": "sha256:b01371e6ad0895119adb1d1dbfa20844d061be6aecadf9165128cebb28b9fd8e" 2025-08-14T21:18:54.2606064Z } 2025-08-14T21:18:54.2606379Z ] 2025-08-14T21:18:54.2606697Z } 2025-08-14T21:18:54.2607055Z + exit 0 2025-08-14T21:18:54.2656964Z ##[group]Run set -eux 2025-08-14T21:18:54.2657435Z set -eux 2025-08-14T21:18:54.2658901Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin 2025-08-14T21:18:54.2711551Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:54.2712217Z env: 2025-08-14T21:18:54.2712602Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:54.2713303Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:54.2714357Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:54.2715310Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:54.2716951Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:54.2718437Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:54.2718915Z AWS_REGION: us-east-1 2025-08-14T21:18:54.2719578Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:54.2720238Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:54.2730664Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:54.2731137Z ##[endgroup] 2025-08-14T21:18:54.2819980Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-08-14T21:18:54.2820917Z + jq --raw-output .SecretString 2025-08-14T21:18:54.2821491Z + jq -r .docker_hub_readonly_token 2025-08-14T21:18:54.2822960Z + docker login --username pytorchbot --password-stdin 2025-08-14T21:18:55.0040526Z 2025-08-14T21:18:55.0043599Z An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::308535385114:assumed-role/gha_workflow_s3_and_ecr_read_only/GitHubActions is not authorized to perform: secretsmanager:GetSecretValue on resource: docker_hub_readonly_token because no identity-based policy allows the secretsmanager:GetSecretValue action 2025-08-14T21:18:55.0835877Z Error: Cannot perform an interactive login from a non TTY device 2025-08-14T21:18:55.0882293Z ##[error]Process completed with exit code 1. 2025-08-14T21:18:55.1070223Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-08-14T21:18:55.1071018Z with: 2025-08-14T21:18:55.1072205Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:55.1073688Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:55.1074383Z env: 2025-08-14T21:18:55.1074763Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:55.1075488Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:55.1076517Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:55.1077464Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:55.1079180Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:55.1080808Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:55.1081315Z AWS_REGION: us-east-1 2025-08-14T21:18:55.1081905Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:55.1082565Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:55.1092964Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:55.1093495Z ##[endgroup] 2025-08-14T21:18:55.1122920Z ##[group]Run set -x 2025-08-14T21:18:55.1123814Z set -x 2025-08-14T21:18:55.1124203Z set +e 2025-08-14T21:18:55.1124577Z  2025-08-14T21:18:55.1124935Z login() { 2025-08-14T21:18:55.1125777Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-08-14T21:18:55.1126673Z } 2025-08-14T21:18:55.1127037Z  2025-08-14T21:18:55.1127398Z retry () { 2025-08-14T21:18:55.1127857Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-08-14T21:18:55.1128392Z } 2025-08-14T21:18:55.1128748Z  2025-08-14T21:18:55.1129140Z retry login "${DOCKER_REGISTRY}" 2025-08-14T21:18:55.1129655Z  2025-08-14T21:18:55.1130478Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-08-14T21:18:55.1131586Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-08-14T21:18:55.1132236Z  2025-08-14T21:18:55.1132666Z set -e 2025-08-14T21:18:55.1133344Z # ignore output since only exit code is used for conditional 2025-08-14T21:18:55.1134334Z # only pull docker image if it's not available locally 2025-08-14T21:18:55.1135425Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-08-14T21:18:55.1136448Z  retry docker pull "${DOCKER_IMAGE}" 2025-08-14T21:18:55.1137088Z fi 2025-08-14T21:18:55.1189702Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:55.1190350Z env: 2025-08-14T21:18:55.1190744Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:55.1191478Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:55.1192511Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:55.1193472Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:55.1195119Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:55.1196631Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:55.1197129Z AWS_REGION: us-east-1 2025-08-14T21:18:55.1197693Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:55.1198347Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:55.1208824Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:55.1210519Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:55.1212032Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:55.1212802Z ##[endgroup] 2025-08-14T21:18:55.1293344Z + set +e 2025-08-14T21:18:55.1295135Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:55.1296164Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:55.1301897Z + aws ecr get-login-password --region us-east-1 2025-08-14T21:18:55.1302964Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T21:18:56.6879079Z WARNING! Your password will be stored unencrypted in /home/pytorchci/.docker/config.json. 2025-08-14T21:18:56.6880233Z Configure a credential helper to remove this warning. See 2025-08-14T21:18:56.6881493Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-08-14T21:18:56.6882184Z 2025-08-14T21:18:56.6884074Z Login Succeeded 2025-08-14T21:18:56.6928117Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:56.6929699Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-08-14T21:18:58.2040682Z + IMAGE_SIZE=17743.216945648193 2025-08-14T21:18:58.2041562Z + echo 'Compressed size of image in MB: 17743.216945648193' 2025-08-14T21:18:58.2042345Z + set -e 2025-08-14T21:18:58.2044676Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:18:58.2046486Z Compressed size of image in MB: 17743.216945648193 2025-08-14T21:18:58.2379867Z Prepare all required actions 2025-08-14T21:18:58.2434772Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-08-14T21:18:58.2435238Z with: 2025-08-14T21:18:58.2435871Z github-token: *** 2025-08-14T21:18:58.2436229Z env: 2025-08-14T21:18:58.2436558Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:58.2437123Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:58.2437939Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:58.2438687Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:58.2439986Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:58.2441295Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:58.2441692Z AWS_REGION: us-east-1 2025-08-14T21:18:58.2442179Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:58.2442697Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:58.2450869Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:58.2451109Z ##[endgroup] 2025-08-14T21:18:58.2467326Z ##[group]Run set -eux 2025-08-14T21:18:58.2467687Z set -eux 2025-08-14T21:18:58.2468282Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-08-14T21:18:58.2504678Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:18:58.2505409Z env: 2025-08-14T21:18:58.2505879Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:18:58.2506734Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:18:58.2507895Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:18:58.2508868Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:18:58.2510535Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:18:58.2512051Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:18:58.2512538Z AWS_REGION: us-east-1 2025-08-14T21:18:58.2513114Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:18:58.2513819Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:18:58.2522940Z AWS_SESSION_TOKEN: *** 2025-08-14T21:18:58.2523273Z GITHUB_TOKEN: *** 2025-08-14T21:18:58.2523484Z ##[endgroup] 2025-08-14T21:18:58.2594365Z + python3 .github/scripts/get_workflow_job_id.py 16976255019 pytorch-rocm-hw-43 2025-08-14T21:18:58.7151152Z Setting output job-id=48127805673 2025-08-14T21:18:58.7152136Z Setting output job-name=linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:18:58.7376546Z Prepare all required actions 2025-08-14T21:18:58.7377328Z Getting action download info 2025-08-14T21:18:58.9372342Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-08-14T21:18:59.5733280Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-08-14T21:19:00.2239898Z ##[group]Run ./.github/actions/download-build-artifacts 2025-08-14T21:19:00.2240172Z with: 2025-08-14T21:19:00.2240413Z name: linux-jammy-rocm-py3.10 2025-08-14T21:19:00.2240630Z s3-bucket: gha-artifacts 2025-08-14T21:19:00.2240820Z env: 2025-08-14T21:19:00.2240980Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:19:00.2241276Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:19:00.2241700Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:19:00.2242210Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:19:00.2243260Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:19:00.2243980Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:19:00.2244184Z AWS_REGION: us-east-1 2025-08-14T21:19:00.2244426Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:19:00.2244698Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:19:00.2248896Z AWS_SESSION_TOKEN: *** 2025-08-14T21:19:00.2249081Z ##[endgroup] 2025-08-14T21:19:00.2271367Z ##[group]Run seemethere/download-artifact-s3@v4 2025-08-14T21:19:00.2271609Z with: 2025-08-14T21:19:00.2271778Z name: linux-jammy-rocm-py3.10 2025-08-14T21:19:00.2271995Z s3-bucket: gha-artifacts 2025-08-14T21:19:00.2272190Z region: us-east-1 2025-08-14T21:19:00.2272355Z env: 2025-08-14T21:19:00.2272513Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:19:00.2272811Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:19:00.2273239Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:19:00.2273630Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:19:00.2274308Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:19:00.2274920Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:19:00.2275131Z AWS_REGION: us-east-1 2025-08-14T21:19:00.2275377Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:19:00.2275653Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:19:00.2279842Z AWS_SESSION_TOKEN: *** 2025-08-14T21:19:00.2280034Z ##[endgroup] 2025-08-14T21:19:00.6405687Z (node:1499649) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-08-14T21:19:00.6406427Z 2025-08-14T21:19:00.6407513Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-08-14T21:19:00.6408619Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-08-14T21:19:00.6409584Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-08-14T21:19:00.9340450Z Found 1 objects with prefix pytorch/pytorch/16976255019/linux-jammy-rocm-py3.10/ 2025-08-14T21:19:00.9341743Z Starting download (1/1): /home/pytorchci/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-08-14T21:21:04.6104219Z Finished download (1/1): /home/pytorchci/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-08-14T21:21:04.6119985Z Artifact download has finished successfully 2025-08-14T21:21:04.6631806Z ##[group]Run unzip -o artifacts.zip 2025-08-14T21:21:04.6632384Z unzip -o artifacts.zip 2025-08-14T21:21:04.6682517Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:04.6683203Z env: 2025-08-14T21:21:04.6683601Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:04.6684777Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:04.6685868Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:04.6686864Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:04.6688567Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:04.6690079Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:04.6690574Z AWS_REGION: us-east-1 2025-08-14T21:21:04.6691157Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:04.6691821Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:04.6701648Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:04.6702083Z ##[endgroup] 2025-08-14T21:21:04.6804222Z Archive: artifacts.zip 2025-08-14T21:21:04.6805693Z creating: dist/ 2025-08-14T21:21:09.8501473Z inflating: dist/torch-2.9.0a0+git1fc683c-cp310-cp310-linux_x86_64.whl 2025-08-14T21:21:09.8628628Z inflating: dist/.ninja_log 2025-08-14T21:21:09.8633951Z creating: build/custom_test_artifacts/ 2025-08-14T21:21:09.8634817Z creating: build/custom_test_artifacts/custom-op-build/ 2025-08-14T21:21:09.8635689Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-08-14T21:21:09.8636856Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-08-14T21:21:09.8638266Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-08-14T21:21:09.8639427Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/ 2025-08-14T21:21:09.8640645Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-08-14T21:21:09.8641763Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-08-14T21:21:09.8642845Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-08-14T21:21:09.8644183Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-08-14T21:21:09.8645465Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-08-14T21:21:09.8646633Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-08-14T21:21:09.8647756Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-08-14T21:21:09.8648843Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-08-14T21:21:09.8650113Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-08-14T21:21:09.8651414Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-08-14T21:21:09.8652622Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-08-14T21:21:09.8653935Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-08-14T21:21:09.8655337Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-08-14T21:21:09.8656517Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-08-14T21:21:09.8657462Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-08-14T21:21:09.8658444Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-08-14T21:21:09.8659472Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-08-14T21:21:09.8660621Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-08-14T21:21:09.8661917Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-08-14T21:21:09.8663808Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-08-14T21:21:09.8665034Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-08-14T21:21:09.8666262Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-08-14T21:21:09.8667554Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-08-14T21:21:09.8668841Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-08-14T21:21:09.8670120Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-08-14T21:21:09.8671396Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-08-14T21:21:09.8672678Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-08-14T21:21:09.8853466Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-08-14T21:21:09.8855380Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-08-14T21:21:09.8856734Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-08-14T21:21:09.8858273Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-08-14T21:21:09.8859677Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-08-14T21:21:09.8861028Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-08-14T21:21:09.8862381Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-08-14T21:21:09.8863759Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-08-14T21:21:09.8865122Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-08-14T21:21:09.8866490Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-08-14T21:21:09.8867850Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-08-14T21:21:09.8878519Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-08-14T21:21:09.8953748Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-08-14T21:21:09.8955377Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-08-14T21:21:09.8956742Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-08-14T21:21:09.8957981Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-08-14T21:21:09.8959134Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-08-14T21:21:09.8960252Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-08-14T21:21:09.8961574Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/InstallScripts.json 2025-08-14T21:21:09.8962742Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_outer_vec.cc 2025-08-14T21:21:09.8963847Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_vec_ext.cc 2025-08-14T21:21:09.8964855Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-08-14T21:21:09.8965781Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-08-14T21:21:09.8966721Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-08-14T21:21:09.9111045Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-08-14T21:21:09.9162829Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-08-14T21:21:09.9163875Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-08-14T21:21:09.9164729Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-08-14T21:21:09.9165702Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-08-14T21:21:09.9166891Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-08-14T21:21:09.9168031Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/ 2025-08-14T21:21:09.9169098Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-08-14T21:21:09.9170240Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-08-14T21:21:09.9171360Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-08-14T21:21:09.9172672Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-08-14T21:21:09.9174313Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-08-14T21:21:09.9175534Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-08-14T21:21:09.9176708Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-08-14T21:21:09.9177851Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-08-14T21:21:09.9179223Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-08-14T21:21:09.9180600Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-08-14T21:21:09.9181847Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-08-14T21:21:09.9206317Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-08-14T21:21:09.9207860Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-08-14T21:21:09.9209160Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-08-14T21:21:09.9210190Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-08-14T21:21:09.9211263Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-08-14T21:21:09.9212414Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-08-14T21:21:09.9213716Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-08-14T21:21:09.9215155Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-08-14T21:21:09.9216542Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-08-14T21:21:09.9217824Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-08-14T21:21:09.9219161Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-08-14T21:21:09.9220490Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-08-14T21:21:09.9221816Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-08-14T21:21:09.9223141Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-08-14T21:21:09.9224446Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-08-14T21:21:09.9225879Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-08-14T21:21:09.9264285Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-08-14T21:21:09.9265825Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-08-14T21:21:09.9267154Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-08-14T21:21:09.9268330Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-08-14T21:21:09.9269437Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-08-14T21:21:09.9270492Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-08-14T21:21:09.9271610Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/InstallScripts.json 2025-08-14T21:21:09.9272769Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_outer_vec.cc 2025-08-14T21:21:09.9273844Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_vec_ext.cc 2025-08-14T21:21:09.9274855Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-08-14T21:21:09.9276103Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-08-14T21:21:09.9277041Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-08-14T21:21:09.9305912Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-08-14T21:21:09.9306815Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-08-14T21:21:09.9307695Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-08-14T21:21:09.9308730Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-08-14T21:21:09.9310611Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-08-14T21:21:09.9311812Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/ 2025-08-14T21:21:09.9313008Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-08-14T21:21:09.9314301Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-08-14T21:21:09.9315539Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-08-14T21:21:09.9316940Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-08-14T21:21:09.9318396Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-08-14T21:21:09.9319709Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-08-14T21:21:09.9321125Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-08-14T21:21:09.9322429Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-08-14T21:21:09.9323913Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-08-14T21:21:09.9325410Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-08-14T21:21:09.9326777Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-08-14T21:21:09.9328236Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-08-14T21:21:09.9329807Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-08-14T21:21:09.9331160Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-08-14T21:21:09.9332283Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-08-14T21:21:09.9333419Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-08-14T21:21:09.9334627Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-08-14T21:21:09.9336331Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-08-14T21:21:09.9337940Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-08-14T21:21:09.9339423Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-08-14T21:21:09.9340805Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-08-14T21:21:09.9342248Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-08-14T21:21:09.9343714Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-08-14T21:21:09.9345170Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-08-14T21:21:09.9346623Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-08-14T21:21:09.9348345Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-08-14T21:21:09.9349882Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-08-14T21:21:09.9445672Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-08-14T21:21:09.9447142Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-08-14T21:21:09.9448595Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-08-14T21:21:09.9450223Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-08-14T21:21:09.9451803Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-08-14T21:21:09.9453300Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-08-14T21:21:09.9454801Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-08-14T21:21:09.9456324Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-08-14T21:21:09.9457846Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-08-14T21:21:09.9459358Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-08-14T21:21:09.9460856Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-08-14T21:21:09.9470260Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-08-14T21:21:09.9520894Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-08-14T21:21:09.9522496Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-08-14T21:21:09.9523892Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-08-14T21:21:09.9525145Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-08-14T21:21:09.9526295Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-08-14T21:21:09.9527447Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-08-14T21:21:09.9528653Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/InstallScripts.json 2025-08-14T21:21:09.9530225Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_outer_vec.cc 2025-08-14T21:21:09.9531423Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_vec_ext.cc 2025-08-14T21:21:09.9532509Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-08-14T21:21:09.9533490Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-08-14T21:21:09.9534532Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-08-14T21:21:09.9620860Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-08-14T21:21:09.9656761Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-08-14T21:21:09.9657583Z creating: build/lib/ 2025-08-14T21:21:09.9732704Z inflating: build/lib/libprotobuf-lite.a 2025-08-14T21:21:10.0139497Z inflating: build/lib/libprotobuf.a 2025-08-14T21:21:10.0594617Z inflating: build/lib/libprotoc.a 2025-08-14T21:21:10.0603434Z inflating: build/lib/libpthreadpool.a 2025-08-14T21:21:10.0610830Z inflating: build/lib/libcpuinfo.a 2025-08-14T21:21:10.0617913Z inflating: build/lib/libcpuinfo_internals.a 2025-08-14T21:21:10.0618641Z inflating: build/lib/libclog.a 2025-08-14T21:21:10.0635834Z inflating: build/lib/libpytorch_qnnpack.a 2025-08-14T21:21:10.0637269Z inflating: build/lib/libnnpack_reference_layers.a 2025-08-14T21:21:10.0653824Z inflating: build/lib/libnnpack.a 2025-08-14T21:21:10.0824234Z inflating: build/lib/libmicrokernels-prod.a 2025-08-14T21:21:10.1624665Z inflating: build/lib/libmicrokernels-all.a 2025-08-14T21:21:10.1688686Z inflating: build/lib/libgtest.a 2025-08-14T21:21:10.1704250Z inflating: build/lib/libgmock.a 2025-08-14T21:21:10.1704932Z inflating: build/lib/libgtest_main.a 2025-08-14T21:21:10.1705550Z inflating: build/lib/libgmock_main.a 2025-08-14T21:21:10.1787914Z inflating: build/lib/libXNNPACK.a 2025-08-14T21:21:10.1856611Z inflating: build/lib/libbenchmark.a 2025-08-14T21:21:10.1857370Z inflating: build/lib/libbenchmark_main.a 2025-08-14T21:21:10.1864381Z inflating: build/lib/libittnotify.a 2025-08-14T21:21:10.1865106Z inflating: build/lib/libjitprofiling.a 2025-08-14T21:21:10.1924238Z inflating: build/lib/libasmjit.a 2025-08-14T21:21:10.3025722Z inflating: build/lib/libfbgemm.a 2025-08-14T21:21:10.3053378Z inflating: build/lib/libtensorpipe_uv.a 2025-08-14T21:21:10.3564430Z inflating: build/lib/libtensorpipe.a 2025-08-14T21:21:10.3674999Z inflating: build/lib/libgloo.a 2025-08-14T21:21:10.3718524Z inflating: build/lib/libonnx_proto.a 2025-08-14T21:21:10.4117639Z inflating: build/lib/libgloo_hip.a 2025-08-14T21:21:10.4779475Z inflating: build/lib/libonnx.a 2025-08-14T21:21:11.4268625Z inflating: build/lib/libdnnl.a 2025-08-14T21:21:11.4285182Z inflating: build/lib/libfmt.a 2025-08-14T21:21:11.4550890Z inflating: build/lib/libkineto.a 2025-08-14T21:21:11.4650641Z inflating: build/lib/libc10.so 2025-08-14T21:21:11.4651386Z inflating: build/lib/libtorch_global_deps.so 2025-08-14T21:21:11.4652133Z inflating: build/lib/libcaffe2_nvrtc.so 2025-08-14T21:21:11.4687899Z inflating: build/lib/libc10_hip.so 2025-08-14T21:21:14.2129673Z inflating: build/lib/libtorch_cpu.so 2025-08-14T21:21:14.2132090Z inflating: build/lib/libshm.so 2025-08-14T21:21:15.0057431Z inflating: build/lib/libtorch_hip.so 2025-08-14T21:21:15.0058155Z inflating: build/lib/libtorch.so 2025-08-14T21:21:15.0075112Z inflating: build/lib/libjitbackend_test.so 2025-08-14T21:21:15.0139615Z inflating: build/lib/libtorchbind_test.so 2025-08-14T21:21:15.0161483Z inflating: build/lib/libbackend_with_compiler.so 2025-08-14T21:21:15.0185787Z inflating: build/lib/libaoti_custom_ops.so 2025-08-14T21:21:15.2054300Z inflating: build/lib/libtorch_python.so 2025-08-14T21:21:15.2084709Z inflating: build/lib/libnnapi_backend.so 2025-08-14T21:21:15.2085382Z creating: build/bin/ 2025-08-14T21:21:15.2086472Z creating: build/bin/CMakeFiles/ 2025-08-14T21:21:15.2087108Z inflating: build/bin/cmake_install.cmake 2025-08-14T21:21:15.2087727Z inflating: build/bin/CTestTestfile.cmake 2025-08-14T21:21:15.2498550Z inflating: build/bin/protoc-3.13.0.0 2025-08-14T21:21:15.2910649Z inflating: build/bin/protoc 2025-08-14T21:21:15.2963814Z inflating: build/bin/c10_AllocatorConfig_test 2025-08-14T21:21:15.3013708Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-08-14T21:21:15.3065219Z inflating: build/bin/c10_DeviceGuard_test 2025-08-14T21:21:15.3116800Z inflating: build/bin/c10_Device_test 2025-08-14T21:21:15.3175878Z inflating: build/bin/c10_DispatchKeySet_test 2025-08-14T21:21:15.3229714Z inflating: build/bin/c10_Scalar_test 2025-08-14T21:21:15.3284041Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-08-14T21:21:15.3333086Z inflating: build/bin/c10_StreamGuard_test 2025-08-14T21:21:15.3388788Z inflating: build/bin/c10_InlineStreamGuard_test 2025-08-14T21:21:15.3439805Z inflating: build/bin/c10_SymInt_test 2025-08-14T21:21:15.3495682Z inflating: build/bin/c10_SizesAndStrides_test 2025-08-14T21:21:15.3544693Z inflating: build/bin/c10_ConstexprCrc_test 2025-08-14T21:21:15.3613356Z inflating: build/bin/c10_cow_test 2025-08-14T21:21:15.3663362Z inflating: build/bin/c10_ArrayRef_test 2025-08-14T21:21:15.3713168Z inflating: build/bin/c10_DeadlockDetection_test 2025-08-14T21:21:15.3765982Z inflating: build/bin/c10_Bitset_test 2025-08-14T21:21:15.3822898Z inflating: build/bin/c10_Enumerate_test 2025-08-14T21:21:15.3873353Z inflating: build/bin/c10_Half_test 2025-08-14T21:21:15.3928947Z inflating: build/bin/c10_LeftRight_test 2025-08-14T21:21:15.3981861Z inflating: build/bin/c10_IntrusiveList_test 2025-08-14T21:21:15.4037083Z inflating: build/bin/c10_Metaprogramming_test 2025-08-14T21:21:15.4086645Z inflating: build/bin/c10_Semaphore_test 2025-08-14T21:21:15.4139610Z inflating: build/bin/c10_NetworkFlow_test 2025-08-14T21:21:15.4189426Z inflating: build/bin/c10_Synchronized_test 2025-08-14T21:21:15.4244541Z inflating: build/bin/c10_ThreadLocal_test 2025-08-14T21:21:15.4296371Z inflating: build/bin/c10_TypeIndex_test 2025-08-14T21:21:15.4347376Z inflating: build/bin/c10_TypeList_test 2025-08-14T21:21:15.4398976Z inflating: build/bin/c10_accumulate_test 2025-08-14T21:21:15.4448166Z inflating: build/bin/c10_TypeTraits_test 2025-08-14T21:21:15.4503676Z inflating: build/bin/c10_bfloat16_test 2025-08-14T21:21:15.4553245Z inflating: build/bin/c10_error_test 2025-08-14T21:21:15.4610158Z inflating: build/bin/c10_complex_math_test 2025-08-14T21:21:15.4660519Z inflating: build/bin/c10_bit_cast_test 2025-08-14T21:21:15.4712915Z inflating: build/bin/c10_exception_test 2025-08-14T21:21:15.4768074Z inflating: build/bin/c10_complex_test 2025-08-14T21:21:15.4818465Z inflating: build/bin/c10_flags_test 2025-08-14T21:21:15.4871740Z inflating: build/bin/c10_lazy_test 2025-08-14T21:21:15.4922235Z inflating: build/bin/c10_generic_math_test 2025-08-14T21:21:15.4973260Z inflating: build/bin/c10_irange_test 2025-08-14T21:21:15.5131718Z inflating: build/bin/c10_intrusive_ptr_test 2025-08-14T21:21:15.5188709Z inflating: build/bin/c10_logging_test 2025-08-14T21:21:15.5262567Z inflating: build/bin/c10_optional_test 2025-08-14T21:21:15.5315659Z inflating: build/bin/c10_registry_test 2025-08-14T21:21:15.5377037Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-08-14T21:21:15.5526507Z inflating: build/bin/c10_small_vector_test 2025-08-14T21:21:15.5582732Z inflating: build/bin/c10_string_util_test 2025-08-14T21:21:15.5634212Z inflating: build/bin/c10_ssize_test 2025-08-14T21:21:15.5677849Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-08-14T21:21:15.5727139Z inflating: build/bin/c10_string_view_test 2025-08-14T21:21:15.5777284Z inflating: build/bin/c10_tempfile_test 2025-08-14T21:21:15.5833403Z inflating: build/bin/c10_typeid_test 2025-08-14T21:21:15.5883187Z inflating: build/bin/c10_hip_HIPAssertionsTest_1_var_test 2025-08-14T21:21:15.5931844Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_stream 2025-08-14T21:21:15.5981083Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_thread_and_block_and_device 2025-08-14T21:21:15.6029993Z inflating: build/bin/c10_hip_HIPAssertionsTest_from_2_processes 2025-08-14T21:21:15.6079295Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_blocks_and_threads 2025-08-14T21:21:15.6128435Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_multiple_blocks 2025-08-14T21:21:15.6177568Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_same_block 2025-08-14T21:21:15.6226815Z inflating: build/bin/c10_hip_HIPTest 2025-08-14T21:21:15.6782911Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-08-14T21:21:15.7355017Z inflating: build/bin/vec_test_all_types_AVX512 2025-08-14T21:21:15.7932662Z inflating: build/bin/vec_test_all_types_AVX2 2025-08-14T21:21:15.7985159Z inflating: build/bin/BackoffTest 2025-08-14T21:21:15.8038155Z inflating: build/bin/FileStoreTest 2025-08-14T21:21:15.8094760Z inflating: build/bin/TCPStoreTest 2025-08-14T21:21:15.8148522Z inflating: build/bin/HashStoreTest 2025-08-14T21:21:15.8213806Z inflating: build/bin/ProcessGroupGlooTest 2025-08-14T21:21:15.8215562Z inflating: build/bin/example_allreduce 2025-08-14T21:21:15.8219984Z inflating: build/bin/torch_shm_manager 2025-08-14T21:21:15.8273336Z inflating: build/bin/static_runtime_bench 2025-08-14T21:21:15.8512702Z inflating: build/bin/static_runtime_test 2025-08-14T21:21:15.8585480Z inflating: build/bin/Dict_test 2025-08-14T21:21:15.8637867Z inflating: build/bin/Dimname_test 2025-08-14T21:21:15.8702609Z inflating: build/bin/MaybeOwned_test 2025-08-14T21:21:15.8759313Z inflating: build/bin/NamedTensor_test 2025-08-14T21:21:15.8817416Z inflating: build/bin/apply_utils_test 2025-08-14T21:21:15.8875965Z inflating: build/bin/atest 2025-08-14T21:21:15.8939246Z inflating: build/bin/basic 2025-08-14T21:21:15.8994502Z inflating: build/bin/broadcast_test 2025-08-14T21:21:15.9045437Z inflating: build/bin/cpu_allocator_test 2025-08-14T21:21:15.9103076Z inflating: build/bin/cpu_generator_test 2025-08-14T21:21:15.9155942Z inflating: build/bin/cpu_profiling_allocator_test 2025-08-14T21:21:15.9245986Z inflating: build/bin/cpu_rng_test 2025-08-14T21:21:15.9296978Z inflating: build/bin/dlconvertor_test 2025-08-14T21:21:15.9353970Z inflating: build/bin/extension_backend_test 2025-08-14T21:21:15.9409336Z inflating: build/bin/half_test 2025-08-14T21:21:15.9502354Z inflating: build/bin/ivalue_test 2025-08-14T21:21:15.9552261Z inflating: build/bin/lazy_tensor_test 2025-08-14T21:21:15.9606053Z inflating: build/bin/math_kernel_test 2025-08-14T21:21:15.9659422Z inflating: build/bin/memory_format_test 2025-08-14T21:21:15.9710487Z inflating: build/bin/operator_name_test 2025-08-14T21:21:15.9763920Z inflating: build/bin/memory_overlapping_test 2025-08-14T21:21:15.9819614Z inflating: build/bin/native_test 2025-08-14T21:21:15.9872702Z inflating: build/bin/mobile_memory_cleanup 2025-08-14T21:21:15.9923448Z inflating: build/bin/operators_test 2025-08-14T21:21:15.9975321Z inflating: build/bin/packedtensoraccessor_test 2025-08-14T21:21:16.0041592Z inflating: build/bin/pow_test 2025-08-14T21:21:16.0098642Z inflating: build/bin/quantized_test 2025-08-14T21:21:16.0149651Z inflating: build/bin/reportMemoryUsage_test 2025-08-14T21:21:16.0200024Z inflating: build/bin/reduce_ops_test 2025-08-14T21:21:16.0258598Z inflating: build/bin/scalar_test 2025-08-14T21:21:16.0314805Z inflating: build/bin/scalar_tensor_test 2025-08-14T21:21:16.0366222Z inflating: build/bin/StorageUtils_test 2025-08-14T21:21:16.0418186Z inflating: build/bin/stride_properties_test 2025-08-14T21:21:16.0473655Z inflating: build/bin/type_ptr_test 2025-08-14T21:21:16.0550906Z inflating: build/bin/tensor_iterator_test 2025-08-14T21:21:16.0601423Z inflating: build/bin/thread_init_test 2025-08-14T21:21:16.0655698Z inflating: build/bin/test_parallel 2025-08-14T21:21:16.0708200Z inflating: build/bin/undefined_tensor_test 2025-08-14T21:21:16.0767263Z inflating: build/bin/type_test 2025-08-14T21:21:16.0817225Z inflating: build/bin/verify_api_visibility 2025-08-14T21:21:16.0868296Z inflating: build/bin/weakref_test 2025-08-14T21:21:16.0937636Z inflating: build/bin/legacy_vmap_test 2025-08-14T21:21:16.0988904Z inflating: build/bin/wrapdim_test 2025-08-14T21:21:16.1092094Z inflating: build/bin/List_test 2025-08-14T21:21:16.1150957Z inflating: build/bin/IListRef_test 2025-08-14T21:21:16.1202361Z inflating: build/bin/xla_tensor_test 2025-08-14T21:21:16.1317992Z inflating: build/bin/kernel_function_legacy_test 2025-08-14T21:21:16.1411424Z inflating: build/bin/kernel_function_test 2025-08-14T21:21:16.1532969Z inflating: build/bin/kernel_lambda_legacy_test 2025-08-14T21:21:16.1598220Z inflating: build/bin/KernelFunction_test 2025-08-14T21:21:16.1697349Z inflating: build/bin/kernel_lambda_test 2025-08-14T21:21:16.1757090Z inflating: build/bin/kernel_stackbased_test 2025-08-14T21:21:16.1808277Z inflating: build/bin/CppSignature_test 2025-08-14T21:21:16.1901054Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-08-14T21:21:16.1950071Z inflating: build/bin/op_allowlist_test 2025-08-14T21:21:16.2246916Z inflating: build/bin/op_registration_test 2025-08-14T21:21:16.2312875Z inflating: build/bin/inline_container_test 2025-08-14T21:21:16.2361939Z inflating: build/bin/hip_complex_math_test 2025-08-14T21:21:16.2416779Z inflating: build/bin/backend_fallback_test 2025-08-14T21:21:16.2469667Z inflating: build/bin/hip_apply_test 2025-08-14T21:21:16.2519068Z inflating: build/bin/hip_complex_test 2025-08-14T21:21:16.2568298Z inflating: build/bin/hip_distributions_test 2025-08-14T21:21:16.2617251Z inflating: build/bin/hip_generator_test 2025-08-14T21:21:16.2666887Z inflating: build/bin/hip_half_test 2025-08-14T21:21:16.2716056Z inflating: build/bin/hip_integer_divider_test 2025-08-14T21:21:16.2765110Z inflating: build/bin/hip_optional_test 2025-08-14T21:21:16.2814169Z inflating: build/bin/hip_packedtensoraccessor_test 2025-08-14T21:21:16.2863096Z inflating: build/bin/hip_vectorized_test 2025-08-14T21:21:16.2914765Z inflating: build/bin/hip_dlconvertor_test 2025-08-14T21:21:16.3937457Z inflating: build/bin/test_jit 2025-08-14T21:21:16.4232603Z inflating: build/bin/test_nativert 2025-08-14T21:21:16.4287735Z inflating: build/bin/test_dist_autograd 2025-08-14T21:21:16.4354332Z inflating: build/bin/test_cpp_rpc 2025-08-14T21:21:16.4355749Z inflating: build/bin/parallel_benchmark 2025-08-14T21:21:16.5430861Z inflating: build/bin/test_api 2025-08-14T21:21:16.5757990Z inflating: build/bin/test_lazy 2025-08-14T21:21:16.5758684Z creating: .additional_ci_files/ 2025-08-14T21:21:16.5830896Z inflating: .additional_ci_files/test-times.json 2025-08-14T21:21:16.6106025Z inflating: .additional_ci_files/test-class-times.json 2025-08-14T21:21:16.6160289Z ##[group]Run rm artifacts.zip 2025-08-14T21:21:16.6160943Z rm artifacts.zip 2025-08-14T21:21:16.6211554Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:16.6212185Z env: 2025-08-14T21:21:16.6212562Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:16.6213249Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:16.6214231Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:16.6215132Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:16.6217066Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:16.6218536Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:16.6219009Z AWS_REGION: us-east-1 2025-08-14T21:21:16.6219596Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:16.6220366Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:16.6230056Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:16.6230488Z ##[endgroup] 2025-08-14T21:21:16.9537902Z ##[group]Run df -H 2025-08-14T21:21:16.9538334Z df -H 2025-08-14T21:21:16.9585517Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:16.9586187Z env: 2025-08-14T21:21:16.9586586Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:16.9587308Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:16.9588352Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:16.9589327Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:16.9591006Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:16.9592937Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:16.9593437Z AWS_REGION: us-east-1 2025-08-14T21:21:16.9593987Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:16.9594655Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:16.9605078Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:16.9605539Z ##[endgroup] 2025-08-14T21:21:16.9708787Z Filesystem Size Used Avail Use% Mounted on 2025-08-14T21:21:16.9709628Z tmpfs 14G 17M 14G 1% /run 2025-08-14T21:21:16.9710389Z /dev/mapper/ubuntu--vg-ubuntu--lv 1.9T 1.1T 716G 61% / 2025-08-14T21:21:16.9711121Z tmpfs 68G 8.2k 68G 1% /dev/shm 2025-08-14T21:21:16.9711758Z tmpfs 5.3M 0 5.3M 0% /run/lock 2025-08-14T21:21:16.9712456Z /dev/sda2 2.1G 335M 1.6G 18% /boot 2025-08-14T21:21:16.9713229Z /dev/sda1 1.2G 6.4M 1.2G 1% /boot/efi 2025-08-14T21:21:16.9714112Z tmpfs 14G 17k 14G 1% /run/user/1001 2025-08-14T21:21:16.9775873Z Prepare all required actions 2025-08-14T21:21:16.9776590Z Getting action download info 2025-08-14T21:21:17.3356716Z ##[group]Run ./.github/actions/download-td-artifacts 2025-08-14T21:21:17.3357361Z with: 2025-08-14T21:21:17.3357727Z env: 2025-08-14T21:21:17.3358094Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:17.3358812Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:17.3359850Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:17.3361033Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:17.3362685Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:17.3364210Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:17.3364703Z AWS_REGION: us-east-1 2025-08-14T21:21:17.3365362Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:17.3366024Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:17.3376289Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:17.3376729Z ##[endgroup] 2025-08-14T21:21:17.3428731Z ##[group]Run seemethere/download-artifact-s3@v4 2025-08-14T21:21:17.3429304Z with: 2025-08-14T21:21:17.3429662Z name: td_results 2025-08-14T21:21:17.3430083Z s3-bucket: gha-artifacts 2025-08-14T21:21:17.3430531Z region: us-east-1 2025-08-14T21:21:17.3430912Z env: 2025-08-14T21:21:17.3431277Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:17.3431966Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:17.3432982Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:17.3433924Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:17.3435636Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:17.3437174Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:17.3437683Z AWS_REGION: us-east-1 2025-08-14T21:21:17.3438219Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:17.3438869Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:17.3449299Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:17.3449745Z ##[endgroup] 2025-08-14T21:21:17.7566619Z (node:1499757) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-08-14T21:21:17.7567502Z 2025-08-14T21:21:17.7567838Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-08-14T21:21:17.7568774Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-08-14T21:21:17.7569729Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-08-14T21:21:18.0399205Z Found 1 objects with prefix pytorch/pytorch/16976255019/td_results/ 2025-08-14T21:21:18.0400569Z Starting download (1/1): /home/pytorchci/actions-runner/_work/pytorch/pytorch/td_results.json 2025-08-14T21:21:18.4112779Z Finished download (1/1): /home/pytorchci/actions-runner/_work/pytorch/pytorch/td_results.json 2025-08-14T21:21:18.4127629Z Artifact download has finished successfully 2025-08-14T21:21:18.4665868Z ##[group]Run mkdir -p .additional_ci_files 2025-08-14T21:21:18.4666534Z mkdir -p .additional_ci_files 2025-08-14T21:21:18.4667274Z mv td_results.json .additional_ci_files/td_results.json || true 2025-08-14T21:21:18.4720633Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:18.4721323Z env: 2025-08-14T21:21:18.4721724Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:18.4722470Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:18.4723524Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:18.4724531Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:18.4726698Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:18.4728268Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:18.4728773Z AWS_REGION: us-east-1 2025-08-14T21:21:18.4729348Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:18.4730023Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:18.4740323Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:18.4740799Z ##[endgroup] 2025-08-14T21:21:18.4921939Z ##[group]Run .github/scripts/parse_ref.py 2025-08-14T21:21:18.4922637Z .github/scripts/parse_ref.py 2025-08-14T21:21:18.4972999Z shell: /usr/bin/bash -e {0} 2025-08-14T21:21:18.4973474Z env: 2025-08-14T21:21:18.4973849Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:18.4974526Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:18.4975546Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:18.4976449Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:18.4978030Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:18.4979527Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:18.4980122Z AWS_REGION: us-east-1 2025-08-14T21:21:18.4980803Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:18.4981544Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:18.4991805Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:18.4992265Z ##[endgroup] 2025-08-14T21:21:18.5312775Z Setting output branch=main 2025-08-14T21:21:18.5532321Z Prepare all required actions 2025-08-14T21:21:18.5533633Z Getting action download info 2025-08-14T21:21:18.7432496Z ##[group]Run ./.github/actions/filter-test-configs 2025-08-14T21:21:18.7433118Z with: 2025-08-14T21:21:18.7433823Z github-token: *** 2025-08-14T21:21:18.7436422Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.2"}]} 2025-08-14T21:21:18.7439311Z job-name: linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:21:18.7440070Z env: 2025-08-14T21:21:18.7440657Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:18.7441386Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:18.7442416Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:18.7443846Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:18.7445516Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:18.7447024Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:18.7447530Z AWS_REGION: us-east-1 2025-08-14T21:21:18.7448062Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:18.7448777Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:18.7459087Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:18.7459533Z ##[endgroup] 2025-08-14T21:21:18.7522604Z ##[group]Run nick-fields/retry@v3.0.0 2025-08-14T21:21:18.7523138Z with: 2025-08-14T21:21:18.7523498Z shell: bash 2025-08-14T21:21:18.7523903Z timeout_minutes: 10 2025-08-14T21:21:18.7524325Z max_attempts: 5 2025-08-14T21:21:18.7524730Z retry_wait_seconds: 30 2025-08-14T21:21:18.7526068Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-08-14T21:21:18.7527457Z polling_interval_seconds: 1 2025-08-14T21:21:18.7527956Z warning_on_retry: true 2025-08-14T21:21:18.7528394Z continue_on_error: false 2025-08-14T21:21:18.7528843Z env: 2025-08-14T21:21:18.7529225Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:18.7529926Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:18.7530978Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:18.7531951Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:18.7533610Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:18.7535123Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:18.7535621Z AWS_REGION: us-east-1 2025-08-14T21:21:18.7536177Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:18.7536851Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:18.7547115Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:18.7547804Z GITHUB_TOKEN: *** 2025-08-14T21:21:18.7548220Z ##[endgroup] 2025-08-14T21:21:18.8340323Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-08-14T21:21:19.0836179Z Defaulting to user installation because normal site-packages is not writeable 2025-08-14T21:21:19.1746564Z Requirement already satisfied: requests==2.27.1 in /home/pytorchci/.local/lib/python3.10/site-packages (2.27.1) 2025-08-14T21:21:19.1749332Z Requirement already satisfied: pyyaml==6.0.2 in /home/pytorchci/.local/lib/python3.10/site-packages (6.0.2) 2025-08-14T21:21:19.1840709Z Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests==2.27.1) (2020.6.20) 2025-08-14T21:21:19.1848801Z Requirement already satisfied: charset-normalizer~=2.0.0 in /home/pytorchci/.local/lib/python3.10/site-packages (from requests==2.27.1) (2.0.12) 2025-08-14T21:21:19.1852564Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests==2.27.1) (1.26.5) 2025-08-14T21:21:19.1862250Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests==2.27.1) (3.3) 2025-08-14T21:21:19.8338265Z Command completed after 1 attempt(s). 2025-08-14T21:21:19.8449897Z ##[group]Run set -x 2025-08-14T21:21:19.8450391Z set -x 2025-08-14T21:21:19.8450788Z  2025-08-14T21:21:19.8451450Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-08-14T21:21:19.8452275Z # in runner workspace 2025-08-14T21:21:19.8452984Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-08-14T21:21:19.8504892Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:19.8506004Z env: 2025-08-14T21:21:19.8506402Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:19.8507117Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:19.8508174Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:19.8509182Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:19.8510854Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:19.8512365Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:19.8512877Z AWS_REGION: us-east-1 2025-08-14T21:21:19.8513441Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:19.8514104Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:19.8524556Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:19.8525019Z ##[endgroup] 2025-08-14T21:21:19.8607682Z + python3 /home/pytorchci/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-08-14T21:21:19.8862312Z Setting output branch=main 2025-08-14T21:21:19.8936188Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-08-14T21:21:19.8936900Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-08-14T21:21:19.8937477Z echo "Job name: ${JOB_NAME}" 2025-08-14T21:21:19.8937988Z  2025-08-14T21:21:19.8938637Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-08-14T21:21:19.8939460Z # in runner workspace 2025-08-14T21:21:19.8940195Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-08-14T21:21:19.8941033Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-08-14T21:21:19.8941703Z  --job-name "${JOB_NAME}" \ 2025-08-14T21:21:19.8944876Z  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.2"}]}" \ 2025-08-14T21:21:19.8947660Z  --selected-test-configs "" \ 2025-08-14T21:21:19.8948252Z  --pr-number "${PR_NUMBER}" \ 2025-08-14T21:21:19.8948792Z  --tag "${TAG}" \ 2025-08-14T21:21:19.8949311Z  --event-name "${EVENT_NAME}" \ 2025-08-14T21:21:19.8949867Z  --schedule "${SCHEDULE}" \ 2025-08-14T21:21:19.8950396Z  --branch "${HEAD_BRANCH}" 2025-08-14T21:21:19.9004047Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:19.9005136Z env: 2025-08-14T21:21:19.9005559Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:19.9006283Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:19.9007378Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:19.9008364Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:19.9010160Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:19.9011939Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:19.9012522Z AWS_REGION: us-east-1 2025-08-14T21:21:19.9013187Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:19.9013977Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:19.9025129Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:19.9025839Z GITHUB_TOKEN: *** 2025-08-14T21:21:19.9026532Z JOB_NAME: linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:21:19.9027284Z PR_NUMBER: 2025-08-14T21:21:19.9027646Z TAG: 2025-08-14T21:21:19.9028373Z EVENT_NAME: push 2025-08-14T21:21:19.9028774Z SCHEDULE: 2025-08-14T21:21:19.9029146Z HEAD_BRANCH: main 2025-08-14T21:21:19.9029542Z ##[endgroup] 2025-08-14T21:21:19.9112226Z Workflow: rocm 2025-08-14T21:21:19.9113000Z Job name: linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:21:20.5550113Z Setting output keep-going=True 2025-08-14T21:21:20.5550946Z Setting output ci-verbose-test-logs=False 2025-08-14T21:21:20.5551598Z Setting output ci-test-showlocals=False 2025-08-14T21:21:20.5552192Z Setting output ci-no-test-timeout=False 2025-08-14T21:21:20.5552793Z Setting output ci-no-td=False 2025-08-14T21:21:20.5553425Z Setting output ci-td-distributed=False 2025-08-14T21:21:20.5554001Z Setting output is-unstable=False 2025-08-14T21:21:20.5554530Z Setting output reenabled-issues= 2025-08-14T21:21:20.5557407Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.2"}]} 2025-08-14T21:21:20.5560268Z Setting output is-test-matrix-empty=False 2025-08-14T21:21:20.5739046Z ##[group]Run echo "Filtered matrix:" 2025-08-14T21:21:20.5739645Z echo "Filtered matrix:" 2025-08-14T21:21:20.5742348Z echo "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.2"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.2"}]}" 2025-08-14T21:21:20.5745008Z  2025-08-14T21:21:20.5745365Z echo 2025-08-14T21:21:20.5745838Z echo "Is the current job unstable? False" 2025-08-14T21:21:20.5746406Z  2025-08-14T21:21:20.5746756Z echo 2025-08-14T21:21:20.5747201Z echo "Is keep-going label set? True" 2025-08-14T21:21:20.5747740Z  2025-08-14T21:21:20.5748093Z echo 2025-08-14T21:21:20.5748498Z echo "Reenabled issues? " 2025-08-14T21:21:20.5799296Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:20.5799974Z env: 2025-08-14T21:21:20.5800475Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:20.5801226Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:20.5802610Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:20.5803595Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:20.5805261Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:20.5806759Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:20.5807244Z AWS_REGION: us-east-1 2025-08-14T21:21:20.5807800Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:20.5808455Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:20.5818803Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:20.5819246Z ##[endgroup] 2025-08-14T21:21:20.5898655Z Filtered matrix: 2025-08-14T21:21:20.5901334Z {include: [{config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.2}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.2}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.2}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.2}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.2}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.2}]} 2025-08-14T21:21:20.5908366Z 2025-08-14T21:21:20.5908594Z Is the current job unstable? False 2025-08-14T21:21:20.5908956Z 2025-08-14T21:21:20.5909166Z Is keep-going label set? True 2025-08-14T21:21:20.5909490Z 2025-08-14T21:21:20.5909670Z Reenabled issues? 2025-08-14T21:21:20.5975261Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-08-14T21:21:20.5976198Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-08-14T21:21:20.6021216Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T21:21:20.6021885Z env: 2025-08-14T21:21:20.6022279Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:20.6022996Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:20.6024048Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:20.6025012Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:20.6026684Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:20.6028175Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:20.6028735Z AWS_REGION: us-east-1 2025-08-14T21:21:20.6029291Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:20.6029943Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:20.6040217Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:20.6040832Z JOB_TIMEOUT: 300 2025-08-14T21:21:20.6041230Z ##[endgroup] 2025-08-14T21:21:20.6194405Z ##[group]Run set -x 2025-08-14T21:21:20.6194960Z set -x 2025-08-14T21:21:20.6195354Z  2025-08-14T21:21:20.6195795Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-08-14T21:21:20.6196475Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-08-14T21:21:20.6197153Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-08-14T21:21:20.6197796Z  TEST_COMMAND=.ci/caffe2/test.sh 2025-08-14T21:21:20.6198317Z else 2025-08-14T21:21:20.6198750Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-08-14T21:21:20.6199284Z fi 2025-08-14T21:21:20.6199635Z  2025-08-14T21:21:20.6200203Z # detached container should get cleaned up by teardown_ec2_linux 2025-08-14T21:21:20.6201230Z # TODO: Stop building test binaries as part of the build phase 2025-08-14T21:21:20.6202034Z # Used for GPU_FLAG since that doesn't play nice 2025-08-14T21:21:20.6202720Z # shellcheck disable=SC2086,SC2090 2025-08-14T21:21:20.6203293Z container_name=$(docker run \ 2025-08-14T21:21:20.6203829Z  ${GPU_FLAG:-} \ 2025-08-14T21:21:20.6204306Z  -e BUILD_ENVIRONMENT \ 2025-08-14T21:21:20.6204805Z  -e PR_NUMBER \ 2025-08-14T21:21:20.6205267Z  -e GITHUB_ACTIONS \ 2025-08-14T21:21:20.6205754Z  -e GITHUB_REPOSITORY \ 2025-08-14T21:21:20.6206270Z  -e GITHUB_WORKFLOW \ 2025-08-14T21:21:20.6206751Z  -e GITHUB_JOB \ 2025-08-14T21:21:20.6207218Z  -e GITHUB_RUN_ID \ 2025-08-14T21:21:20.6207700Z  -e GITHUB_RUN_NUMBER \ 2025-08-14T21:21:20.6208201Z  -e GITHUB_RUN_ATTEMPT \ 2025-08-14T21:21:20.6208702Z  -e JOB_ID \ 2025-08-14T21:21:20.6209138Z  -e JOB_NAME \ 2025-08-14T21:21:20.6209571Z  -e BRANCH \ 2025-08-14T21:21:20.6209979Z  -e SHA1 \ 2025-08-14T21:21:20.6210426Z  -e AWS_DEFAULT_REGION \ 2025-08-14T21:21:20.6210940Z  -e IN_WHEEL_TEST \ 2025-08-14T21:21:20.6211412Z  -e SHARD_NUMBER \ 2025-08-14T21:21:20.6211882Z  -e TEST_CONFIG \ 2025-08-14T21:21:20.6212349Z  -e NUM_TEST_SHARDS \ 2025-08-14T21:21:20.6212854Z  -e REENABLED_ISSUES \ 2025-08-14T21:21:20.6213370Z  -e CONTINUE_THROUGH_ERROR \ 2025-08-14T21:21:20.6213896Z  -e VERBOSE_TEST_LOGS \ 2025-08-14T21:21:20.6214766Z  -e TEST_SHOWLOCALS \ 2025-08-14T21:21:20.6215243Z  -e NO_TEST_TIMEOUT \ 2025-08-14T21:21:20.6215703Z  -e NO_TD \ 2025-08-14T21:21:20.6216187Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-08-14T21:21:20.6216799Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-08-14T21:21:20.6217415Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-08-14T21:21:20.6217991Z  -e TESTS_TO_INCLUDE \ 2025-08-14T21:21:20.6218481Z  -e DASHBOARD_TAG \ 2025-08-14T21:21:20.6219124Z  --env-file="${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" \ 2025-08-14T21:21:20.6219859Z  --ulimit stack=10485760:83886080 \ 2025-08-14T21:21:20.6220513Z  --ulimit core=0 \ 2025-08-14T21:21:20.6221127Z  --security-opt seccomp=unconfined \ 2025-08-14T21:21:20.6221758Z  --cap-add=SYS_PTRACE \ 2025-08-14T21:21:20.6222252Z  --shm-size="8g" \ 2025-08-14T21:21:20.6222693Z  --tty \ 2025-08-14T21:21:20.6223094Z  --detach \ 2025-08-14T21:21:20.6223540Z  --name="${container_name}" \ 2025-08-14T21:21:20.6224074Z  --user jenkins \ 2025-08-14T21:21:20.6224663Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-08-14T21:21:20.6225336Z  -w /var/lib/jenkins/workspace \ 2025-08-14T21:21:20.6225868Z  "${DOCKER_IMAGE}" 2025-08-14T21:21:20.6226291Z ) 2025-08-14T21:21:20.6226710Z # save container name for later step 2025-08-14T21:21:20.6227785Z echo "CONTAINER_NAME=${container_name}" >> "$GITHUB_ENV" 2025-08-14T21:21:20.6229081Z # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home 2025-08-14T21:21:20.6230691Z docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" 2025-08-14T21:21:20.6276172Z shell: /usr/bin/bash -e {0} 2025-08-14T21:21:20.6276683Z env: 2025-08-14T21:21:20.6277073Z GIT_DEFAULT_BRANCH: main 2025-08-14T21:21:20.6277794Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T21:21:20.6278829Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T21:21:20.6279827Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T21:21:20.6281628Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T21:21:20.6283131Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T21:21:20.6283632Z AWS_REGION: us-east-1 2025-08-14T21:21:20.6284188Z AWS_ACCESS_KEY_ID: *** 2025-08-14T21:21:20.6284849Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T21:21:20.6295117Z AWS_SESSION_TOKEN: *** 2025-08-14T21:21:20.6295651Z BUILD_ENVIRONMENT: linux-jammy-rocm-py3.10 2025-08-14T21:21:20.6296232Z PR_NUMBER: 2025-08-14T21:21:20.6296675Z GITHUB_REPOSITORY: pytorch/pytorch 2025-08-14T21:21:20.6297217Z GITHUB_WORKFLOW: rocm 2025-08-14T21:21:20.6297638Z GITHUB_JOB: test 2025-08-14T21:21:20.6298039Z GITHUB_RUN_ID: 16976255019 2025-08-14T21:21:20.6298497Z GITHUB_RUN_NUMBER: 29182 2025-08-14T21:21:20.6298941Z GITHUB_RUN_ATTEMPT: 1 2025-08-14T21:21:20.6299350Z JOB_ID: 48127805673 2025-08-14T21:21:20.6300011Z JOB_NAME: linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:21:20.6300754Z BRANCH: main 2025-08-14T21:21:20.6301206Z SHA1: 1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:21:20.6301782Z CONTINUE_THROUGH_ERROR: True 2025-08-14T21:21:20.6302259Z VERBOSE_TEST_LOGS: False 2025-08-14T21:21:20.6302706Z TEST_SHOWLOCALS: False 2025-08-14T21:21:20.6303133Z NO_TEST_TIMEOUT: False 2025-08-14T21:21:20.6303539Z NO_TD: False 2025-08-14T21:21:20.6303916Z TEST_CONFIG: default 2025-08-14T21:21:20.6304327Z SHARD_NUMBER: 2 2025-08-14T21:21:20.6304721Z NUM_TEST_SHARDS: 6 2025-08-14T21:21:20.6305524Z REENABLED_ISSUES: 2025-08-14T21:21:20.6306737Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:21:20.6308096Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2025-08-14T21:21:20.6308654Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-08-14T21:21:20.6309167Z TESTS_TO_INCLUDE: 2025-08-14T21:21:20.6309563Z DASHBOARD_TAG: 2025-08-14T21:21:20.6309972Z ##[endgroup] 2025-08-14T21:21:20.6382056Z + [[ default == \m\u\l\t\i\g\p\u ]] 2025-08-14T21:21:20.6382789Z + [[ linux-jammy-rocm-py3.10 == *onnx* ]] 2025-08-14T21:21:20.6383396Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-08-14T21:21:20.6396344Z +++ nproc --ignore=2 2025-08-14T21:21:20.6425840Z ++ docker run --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e MAX_JOBS=62 -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e TESTS_TO_INCLUDE -e DASHBOARD_TAG --env-file=/home/pytorchci/actions-runner/_work/_temp/github_env_16976255019 --ulimit stack=10485760:83886080 --ulimit core=0 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --shm-size=8g --tty --detach --name= --user jenkins -v /home/pytorchci/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-bfa89110622ba7202628e9faac705f183070defe 2025-08-14T21:21:24.8809523Z + container_name=3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T21:21:24.8810339Z + echo CONTAINER_NAME=3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T21:21:24.8811706Z + docker exec -t 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 sh -c 'cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && .ci/pytorch/test.sh' 2025-08-14T21:21:38.6785724Z Processing ./dist/torch-2.9.0a0+git1fc683c-cp310-cp310-linux_x86_64.whl 2025-08-14T21:21:39.4571462Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git1fc683c) (3.18.0) 2025-08-14T21:21:39.4573516Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git1fc683c) (4.14.1) 2025-08-14T21:21:39.4575472Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git1fc683c) (1.13.3) 2025-08-14T21:21:39.4579620Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git1fc683c) (2.8.8) 2025-08-14T21:21:39.4581445Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git1fc683c) (3.1.6) 2025-08-14T21:21:39.4587026Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git1fc683c) (2025.5.1) 2025-08-14T21:21:39.4886633Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.9.0a0+git1fc683c) (1.3.0) 2025-08-14T21:21:39.4929694Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.9.0a0+git1fc683c) (3.0.2) 2025-08-14T21:21:39.8462142Z Installing collected packages: torch 2025-08-14T21:21:50.3957623Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-08-14T21:21:50.3960312Z helion 0.1.0 requires filecheck, which is not installed. 2025-08-14T21:21:50.3961474Z Successfully installed torch-2.9.0a0+git1fc683c 2025-08-14T21:21:50.4420224Z + export TERM=vt100 2025-08-14T21:21:50.4420722Z + TERM=vt100 2025-08-14T21:21:50.4426256Z ++ dirname .ci/pytorch/test.sh 2025-08-14T21:21:50.4442261Z + source .ci/pytorch/common.sh 2025-08-14T21:21:50.4449452Z +++ dirname .ci/pytorch/common.sh 2025-08-14T21:21:50.4468359Z ++ source .ci/pytorch/common_utils.sh 2025-08-14T21:21:50.4469733Z +++ declare -f -t trap_add 2025-08-14T21:21:50.4476975Z ++ set -ex -o pipefail 2025-08-14T21:21:50.4477589Z ++ [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-08-14T21:21:50.4478192Z ++ unset HIP_PLATFORM 2025-08-14T21:21:50.4478669Z ++ export PYTORCH_TEST_WITH_ROCM=1 2025-08-14T21:21:50.4479213Z ++ PYTORCH_TEST_WITH_ROCM=1 2025-08-14T21:21:50.4479716Z ++ BUILD_TEST_LIBTORCH=0 2025-08-14T21:21:50.4485170Z ++ dirname .ci/pytorch/test.sh 2025-08-14T21:21:50.4511034Z + source .ci/pytorch/common-build.sh 2025-08-14T21:21:50.4512553Z ++ [[ linux-jammy-rocm-py3.10 != *win-* ]] 2025-08-14T21:21:50.4523755Z ++++ dirname .ci/pytorch/common-build.sh 2025-08-14T21:21:50.4547896Z +++ cd .ci/pytorch 2025-08-14T21:21:50.4548410Z +++ pwd -P 2025-08-14T21:21:50.4553179Z ++ script_dir=/var/lib/jenkins/pytorch/.ci/pytorch 2025-08-14T21:21:50.4553974Z ++ [[ linux-jammy-rocm-py3.10 == *-pch* ]] 2025-08-14T21:21:50.4554544Z ++ which sccache 2025-08-14T21:21:50.4576310Z ++ [[ -z '' ]] 2025-08-14T21:21:50.4576841Z ++ unset SCCACHE_BUCKET 2025-08-14T21:21:50.4577326Z ++ unset SCCACHE_REGION 2025-08-14T21:21:50.4577785Z ++ sccache --stop-server 2025-08-14T21:21:50.4619614Z ++ true 2025-08-14T21:21:50.4620206Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-08-14T21:21:50.4638321Z ++ trap_add sccache_epilogue EXIT 2025-08-14T21:21:50.4638661Z ++ trap_add_cmd=sccache_epilogue 2025-08-14T21:21:50.4638971Z ++ shift 2025-08-14T21:21:50.4639164Z ++ for trap_add_name in "$@" 2025-08-14T21:21:50.4651250Z ++++ trap -p EXIT 2025-08-14T21:21:50.4655755Z +++ eval 'extract_trap_cmd ' 2025-08-14T21:21:50.4656066Z ++++ extract_trap_cmd 2025-08-14T21:21:50.4656303Z ++++ printf '%s\n' '' 2025-08-14T21:21:50.4656560Z +++ printf '%s\n' sccache_epilogue 2025-08-14T21:21:50.4658947Z ++ trap -- ' 2025-08-14T21:21:50.4659211Z sccache_epilogue' EXIT 2025-08-14T21:21:50.4659471Z ++ [[ -n '' ]] 2025-08-14T21:21:50.4659740Z ++ [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-08-14T21:21:50.4660146Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-08-14T21:21:50.4660492Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-08-14T21:21:50.4660770Z ++ sccache --start-server 2025-08-14T21:21:50.4700448Z sccache: Starting the server... 2025-08-14T21:21:50.4856411Z sccache: Listening on address 127.0.0.1:4226 2025-08-14T21:21:50.4866290Z ++ sccache --zero-stats 2025-08-14T21:21:50.4891817Z Statistics zeroed. 2025-08-14T21:21:50.4895624Z ++ which ccache 2025-08-14T21:21:50.4909528Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-08-14T21:21:50.4910150Z + echo 'Environment variables:' 2025-08-14T21:21:50.4910686Z Environment variables: 2025-08-14T21:21:50.4911119Z + env 2025-08-14T21:21:50.4921737Z GITHUB_WORKSPACE=/home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-08-14T21:21:50.4922688Z CONTINUE_THROUGH_ERROR=True 2025-08-14T21:21:50.4923232Z BUILD_ENVIRONMENT=linux-jammy-rocm-py3.10 2025-08-14T21:21:50.4923820Z HOSTNAME=pytorch-rocm-hw-43 2025-08-14T21:21:50.4924937Z GITHUB_PATH=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/add_path_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.4926017Z GITHUB_ACTION=__self 2025-08-14T21:21:50.4926472Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-08-14T21:21:50.4926996Z GITHUB_RUN_NUMBER=29182 2025-08-14T21:21:50.4927431Z TEST_CONFIG=default 2025-08-14T21:21:50.4927864Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-08-14T21:21:50.4928390Z AWS_DEFAULT_REGION=us-east-1 2025-08-14T21:21:50.4934383Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-08-14T21:21:50.4934928Z GITHUB_REF_TYPE=branch 2025-08-14T21:21:50.4935694Z *** 2025-08-14T21:21:50.4936086Z GITHUB_REPOSITORY_ID=65600975 2025-08-14T21:21:50.4936577Z GITHUB_ACTIONS=true 2025-08-14T21:21:50.4937041Z SHA1=1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:21:50.4937656Z GITHUB_SHA=1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:21:50.4938506Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/rocm.yml@refs/heads/main 2025-08-14T21:21:50.4939270Z UCC_HOME=/usr 2025-08-14T21:21:50.4939670Z VERBOSE_TEST_LOGS=False 2025-08-14T21:21:50.4940113Z GITHUB_REF=refs/heads/main 2025-08-14T21:21:50.4940549Z SHARD_NUMBER=2 2025-08-14T21:21:50.4940945Z GITHUB_REF_PROTECTED=true 2025-08-14T21:21:50.4941383Z HOME=/var/lib/jenkins 2025-08-14T21:21:50.4941868Z GITHUB_API_URL=https://api.github.com 2025-08-14T21:21:50.4942468Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-08-14T21:21:50.4942986Z LANG=C.UTF-8 2025-08-14T21:21:50.4943469Z UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6 2025-08-14T21:21:50.4944094Z PYTORCH_TEST_WITH_ROCM=1 2025-08-14T21:21:50.4944528Z NUM_TEST_SHARDS=6 2025-08-14T21:21:50.4944911Z UCX_HOME=/usr 2025-08-14T21:21:50.4945937Z GITHUB_STATE=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/save_state_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.4947272Z JOB_NAME=linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:21:50.4948028Z MAGMA_HOME=/opt/rocm/magma 2025-08-14T21:21:50.4949368Z GITHUB_ENV=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_env_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.4950794Z GITHUB_EVENT_PATH=/home/pytorchci/actions-runner/_work/_temp/_github_workflow/event.json 2025-08-14T21:21:50.4951651Z GITHUB_EVENT_NAME=push 2025-08-14T21:21:50.4952061Z DASHBOARD_TAG= 2025-08-14T21:21:50.4952454Z GITHUB_RUN_ID=16976255019 2025-08-14T21:21:50.4953545Z GITHUB_STEP_SUMMARY=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/step_summary_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.4954745Z GITHUB_ACTOR=pytorchmergebot 2025-08-14T21:21:50.4955200Z PR_NUMBER= 2025-08-14T21:21:50.4955568Z GITHUB_RUN_ATTEMPT=1 2025-08-14T21:21:50.4956000Z ANACONDA_PYTHON_VERSION=3.10 2025-08-14T21:21:50.4956556Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-08-14T21:21:50.4957133Z TERM=vt100 2025-08-14T21:21:50.4957488Z INSTALLED_VISION=yes 2025-08-14T21:21:50.4957880Z BRANCH=main 2025-08-14T21:21:50.4958266Z OPENSSL_ROOT_DIR=/opt/openssl 2025-08-14T21:21:50.4958738Z TESTS_TO_INCLUDE= 2025-08-14T21:21:50.4959587Z GITHUB_ACTION_PATH=/home/pytorchci/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-08-14T21:21:50.4960697Z GITHUB_SERVER_URL=https://github.com 2025-08-14T21:21:50.4961238Z PYTORCH_ROCM_ARCH=gfx90a;gfx942 2025-08-14T21:21:50.4961777Z UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d 2025-08-14T21:21:50.4962337Z REENABLED_ISSUES= 2025-08-14T21:21:50.4962726Z SHLVL=1 2025-08-14T21:21:50.4963063Z MAX_JOBS=62 2025-08-14T21:21:50.4963431Z GITHUB_ACTOR_ID=97764156 2025-08-14T21:21:50.4964001Z GITHUB_WORKFLOW_SHA=1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:21:50.4964640Z GITHUB_REF_NAME=main 2025-08-14T21:21:50.4965049Z ROCM_PATH=/opt/rocm 2025-08-14T21:21:50.4965435Z GITHUB_JOB=test 2025-08-14T21:21:50.4965826Z NO_TEST_TIMEOUT=False 2025-08-14T21:21:50.4966275Z GITHUB_REPOSITORY=pytorch/pytorch 2025-08-14T21:21:50.4966766Z LC_ALL=C.UTF-8 2025-08-14T21:21:50.4967155Z GITHUB_RETENTION_DAYS=90 2025-08-14T21:21:50.4967595Z OPENSSL_DIR=/opt/openssl 2025-08-14T21:21:50.4968048Z GITHUB_ACTION_REPOSITORY= 2025-08-14T21:21:50.4969686Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-08-14T21:21:50.4971341Z GITHUB_BASE_REF= 2025-08-14T21:21:50.4971718Z CI=true 2025-08-14T21:21:50.4972096Z GITHUB_REPOSITORY_OWNER=pytorch 2025-08-14T21:21:50.4972889Z JOB_ID=48127805673 2025-08-14T21:21:50.4973270Z GITHUB_HEAD_REF= 2025-08-14T21:21:50.4973649Z GITHUB_ACTION_REF= 2025-08-14T21:21:50.4974040Z TEST_SHOWLOCALS=False 2025-08-14T21:21:50.4974454Z GITHUB_WORKFLOW=rocm 2025-08-14T21:21:50.4974891Z DEBIAN_FRONTEND=noninteractive 2025-08-14T21:21:50.4975973Z GITHUB_OUTPUT=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_output_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.4977051Z NO_TD=False 2025-08-14T21:21:50.4977417Z OLDPWD=/var/lib/jenkins 2025-08-14T21:21:50.4977835Z _=/usr/bin/env 2025-08-14T21:21:50.4978393Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-08-14T21:21:50.5114501Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-08-14T21:21:50.5115690Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T21:21:50.5116763Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-08-14T21:21:50.5117820Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-08-14T21:21:50.5118583Z + BUILD_DIR=build 2025-08-14T21:21:50.5119021Z + BUILD_RENAMED_DIR=build_renamed 2025-08-14T21:21:50.5119536Z + BUILD_BIN_DIR=build/bin 2025-08-14T21:21:50.5119969Z + SHARD_NUMBER=2 2025-08-14T21:21:50.5120474Z + NUM_TEST_SHARDS=6 2025-08-14T21:21:50.5120913Z + export TORCH_SERIALIZATION_DEBUG=1 2025-08-14T21:21:50.5121439Z + TORCH_SERIALIZATION_DEBUG=1 2025-08-14T21:21:50.5121911Z + export VALGRIND=ON 2025-08-14T21:21:50.5122324Z + VALGRIND=ON 2025-08-14T21:21:50.5123372Z + [[ linux-jammy-rocm-py3.10 == *clang9* ]] 2025-08-14T21:21:50.5124004Z + [[ linux-jammy-rocm-py3.10 == *xpu* ]] 2025-08-14T21:21:50.5124571Z + [[ linux-jammy-rocm-py3.10 == *s390x* ]] 2025-08-14T21:21:50.5125075Z + [[ 0 == \1 ]] 2025-08-14T21:21:50.5125452Z + [[ True == \1 ]] 2025-08-14T21:21:50.5125885Z + [[ linux-jammy-rocm-py3.10 != *bazel* ]] 2025-08-14T21:21:50.5126463Z ++ realpath build/custom_test_artifacts 2025-08-14T21:21:50.5131380Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/pytorch/build/custom_test_artifacts 2025-08-14T21:21:50.5132378Z + [[ -n '' ]] 2025-08-14T21:21:50.5132800Z + echo 'Environment variables' 2025-08-14T21:21:50.5133324Z Environment variables 2025-08-14T21:21:50.5133734Z + env 2025-08-14T21:21:50.5142479Z GITHUB_WORKSPACE=/home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-08-14T21:21:50.5143378Z CONTINUE_THROUGH_ERROR=True 2025-08-14T21:21:50.5143939Z BUILD_ENVIRONMENT=linux-jammy-rocm-py3.10 2025-08-14T21:21:50.5144587Z HOSTNAME=pytorch-rocm-hw-43 2025-08-14T21:21:50.5145685Z GITHUB_PATH=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/add_path_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.5146765Z GITHUB_ACTION=__self 2025-08-14T21:21:50.5147223Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-08-14T21:21:50.5147732Z GITHUB_RUN_NUMBER=29182 2025-08-14T21:21:50.5148160Z TEST_CONFIG=default 2025-08-14T21:21:50.5148603Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-08-14T21:21:50.5149152Z AWS_DEFAULT_REGION=us-east-1 2025-08-14T21:21:50.5149672Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-08-14T21:21:50.5150220Z GITHUB_REF_TYPE=branch 2025-08-14T21:21:50.5150750Z *** 2025-08-14T21:21:50.5151144Z GITHUB_REPOSITORY_ID=65600975 2025-08-14T21:21:50.5151693Z GITHUB_ACTIONS=true 2025-08-14T21:21:50.5152190Z SHA1=1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:21:50.5152833Z GITHUB_SHA=1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:21:50.5153680Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/rocm.yml@refs/heads/main 2025-08-14T21:21:50.5154494Z UCC_HOME=/usr 2025-08-14T21:21:50.5154904Z TORCH_SERIALIZATION_DEBUG=1 2025-08-14T21:21:50.5155384Z VERBOSE_TEST_LOGS=False 2025-08-14T21:21:50.5155821Z GITHUB_REF=refs/heads/main 2025-08-14T21:21:50.5156248Z SHARD_NUMBER=2 2025-08-14T21:21:50.5156650Z GITHUB_REF_PROTECTED=true 2025-08-14T21:21:50.5157089Z HOME=/var/lib/jenkins 2025-08-14T21:21:50.5157581Z GITHUB_API_URL=https://api.github.com 2025-08-14T21:21:50.5158854Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-08-14T21:21:50.5159451Z LANG=C.UTF-8 2025-08-14T21:21:50.5159970Z UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6 2025-08-14T21:21:50.5160647Z PYTORCH_TEST_WITH_ROCM=1 2025-08-14T21:21:50.5161056Z NUM_TEST_SHARDS=6 2025-08-14T21:21:50.5161424Z UCX_HOME=/usr 2025-08-14T21:21:50.5162376Z GITHUB_STATE=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/save_state_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.5163615Z JOB_NAME=linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T21:21:50.5164338Z MAGMA_HOME=/opt/rocm/magma 2025-08-14T21:21:50.5165273Z GITHUB_ENV=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_env_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.5166566Z GITHUB_EVENT_PATH=/home/pytorchci/actions-runner/_work/_temp/_github_workflow/event.json 2025-08-14T21:21:50.5167359Z GITHUB_EVENT_NAME=push 2025-08-14T21:21:50.5167752Z DASHBOARD_TAG= 2025-08-14T21:21:50.5168127Z GITHUB_RUN_ID=16976255019 2025-08-14T21:21:50.5169134Z GITHUB_STEP_SUMMARY=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/step_summary_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.5170234Z GITHUB_ACTOR=pytorchmergebot 2025-08-14T21:21:50.5170669Z PR_NUMBER= 2025-08-14T21:21:50.5171012Z GITHUB_RUN_ATTEMPT=1 2025-08-14T21:21:50.5171390Z VALGRIND=ON 2025-08-14T21:21:50.5171741Z ANACONDA_PYTHON_VERSION=3.10 2025-08-14T21:21:50.5172269Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-08-14T21:21:50.5172816Z TERM=vt100 2025-08-14T21:21:50.5173444Z INSTALLED_VISION=yes 2025-08-14T21:21:50.5173828Z BRANCH=main 2025-08-14T21:21:50.5174192Z OPENSSL_ROOT_DIR=/opt/openssl 2025-08-14T21:21:50.5174626Z TESTS_TO_INCLUDE= 2025-08-14T21:21:50.5175433Z GITHUB_ACTION_PATH=/home/pytorchci/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-08-14T21:21:50.5176375Z GITHUB_SERVER_URL=https://github.com 2025-08-14T21:21:50.5176888Z PYTORCH_ROCM_ARCH=gfx90a;gfx942 2025-08-14T21:21:50.5177414Z UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d 2025-08-14T21:21:50.5177936Z REENABLED_ISSUES= 2025-08-14T21:21:50.5178295Z SHLVL=1 2025-08-14T21:21:50.5178615Z MAX_JOBS=62 2025-08-14T21:21:50.5178969Z GITHUB_ACTOR_ID=97764156 2025-08-14T21:21:50.5179511Z GITHUB_WORKFLOW_SHA=1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T21:21:50.5180104Z GITHUB_REF_NAME=main 2025-08-14T21:21:50.5180491Z ROCM_PATH=/opt/rocm 2025-08-14T21:21:50.5180856Z GITHUB_JOB=test 2025-08-14T21:21:50.5181215Z NO_TEST_TIMEOUT=False 2025-08-14T21:21:50.5181653Z GITHUB_REPOSITORY=pytorch/pytorch 2025-08-14T21:21:50.5182104Z LC_ALL=C.UTF-8 2025-08-14T21:21:50.5182512Z GITHUB_RETENTION_DAYS=90 2025-08-14T21:21:50.5182926Z OPENSSL_DIR=/opt/openssl 2025-08-14T21:21:50.5183345Z GITHUB_ACTION_REPOSITORY= 2025-08-14T21:21:50.5184884Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-08-14T21:21:50.5186432Z GITHUB_BASE_REF= 2025-08-14T21:21:50.5186781Z CI=true 2025-08-14T21:21:50.5187140Z GITHUB_REPOSITORY_OWNER=pytorch 2025-08-14T21:21:50.5187576Z JOB_ID=48127805673 2025-08-14T21:21:50.5187933Z GITHUB_HEAD_REF= 2025-08-14T21:21:50.5188291Z GITHUB_ACTION_REF= 2025-08-14T21:21:50.5188663Z TEST_SHOWLOCALS=False 2025-08-14T21:21:50.5189049Z GITHUB_WORKFLOW=rocm 2025-08-14T21:21:50.5189468Z DEBIAN_FRONTEND=noninteractive 2025-08-14T21:21:50.5190497Z GITHUB_OUTPUT=/home/pytorchci/actions-runner/_work/_temp/_runner_file_commands/set_output_f64d5d67-66f1-4bd4-9e61-f9da3af40835 2025-08-14T21:21:50.5191516Z NO_TD=False 2025-08-14T21:21:50.5191869Z OLDPWD=/var/lib/jenkins 2025-08-14T21:21:50.5192250Z _=/usr/bin/env 2025-08-14T21:21:50.5192614Z + echo 'Testing pytorch' 2025-08-14T21:21:50.5193012Z Testing pytorch 2025-08-14T21:21:50.5193381Z + export LANG=C.UTF-8 2025-08-14T21:21:50.5193761Z + LANG=C.UTF-8 2025-08-14T21:21:50.5194107Z + PR_NUMBER= 2025-08-14T21:21:50.5194757Z + [[ default == \d\e\f\a\u\l\t ]] 2025-08-14T21:21:50.5195232Z + export CUDA_VISIBLE_DEVICES=0 2025-08-14T21:21:50.5195671Z + CUDA_VISIBLE_DEVICES=0 2025-08-14T21:21:50.5196097Z + export HIP_VISIBLE_DEVICES=0 2025-08-14T21:21:50.5196541Z + HIP_VISIBLE_DEVICES=0 2025-08-14T21:21:50.5196964Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-08-14T21:21:50.5197466Z + [[ default == \s\l\o\w ]] 2025-08-14T21:21:50.5197960Z + [[ linux-jammy-rocm-py3.10 == *slow-gradcheck* ]] 2025-08-14T21:21:50.5198548Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]] 2025-08-14T21:21:50.5199078Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-08-14T21:21:50.5199613Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-08-14T21:21:50.5200158Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-08-14T21:21:50.5200758Z + [[ default == *crossref* ]] 2025-08-14T21:21:50.5201210Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-08-14T21:21:50.5201688Z + export VALGRIND=OFF 2025-08-14T21:21:50.5202066Z + VALGRIND=OFF 2025-08-14T21:21:50.5202420Z + rocminfo 2025-08-14T21:21:50.5280054Z ROCk module version 6.8.5 is loaded 2025-08-14T21:21:50.6112199Z ===================== 2025-08-14T21:21:50.6112796Z HSA System Attributes 2025-08-14T21:21:50.6113368Z ===================== 2025-08-14T21:21:50.6113841Z Runtime Version: 1.15 2025-08-14T21:21:50.6114410Z Runtime Ext Version: 1.7 2025-08-14T21:21:50.6114929Z System Timestamp Freq.: 1000.000000MHz 2025-08-14T21:21:50.6115778Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-08-14T21:21:50.6139520Z Machine Model: LARGE 2025-08-14T21:21:50.6140425Z System Endianness: LITTLE 2025-08-14T21:21:50.6141131Z Mwaitx: DISABLED 2025-08-14T21:21:50.6141650Z XNACK enabled: NO 2025-08-14T21:21:50.6142137Z DMAbuf Support: YES 2025-08-14T21:21:50.6142616Z VMM Support: YES 2025-08-14T21:21:50.6142927Z 2025-08-14T21:21:50.6143098Z ========== 2025-08-14T21:21:50.6143542Z HSA Agents 2025-08-14T21:21:50.6143967Z ========== 2025-08-14T21:21:50.6144366Z ******* 2025-08-14T21:21:50.6144776Z Agent 1 2025-08-14T21:21:50.6145171Z ******* 2025-08-14T21:21:50.6145677Z Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:21:50.6146334Z Uuid: CPU-XX 2025-08-14T21:21:50.6147014Z Marketing Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:21:50.6147714Z Vendor Name: CPU 2025-08-14T21:21:50.6148368Z Feature: None specified 2025-08-14T21:21:50.6149016Z Profile: FULL_PROFILE 2025-08-14T21:21:50.6149691Z Float Round Mode: NEAR 2025-08-14T21:21:50.6150367Z Max Queue Number: 0(0x0) 2025-08-14T21:21:50.6151031Z Queue Min Size: 0(0x0) 2025-08-14T21:21:50.6151675Z Queue Max Size: 0(0x0) 2025-08-14T21:21:50.6152312Z Queue Type: MULTI 2025-08-14T21:21:50.6152913Z Node: 0 2025-08-14T21:21:50.6153515Z Device Type: CPU 2025-08-14T21:21:50.6154096Z Cache Info: 2025-08-14T21:21:50.6154593Z L1: 32768(0x8000) KB 2025-08-14T21:21:50.6155184Z Chip ID: 0(0x0) 2025-08-14T21:21:50.6155808Z ASIC Revision: 0(0x0) 2025-08-14T21:21:50.6156470Z Cacheline Size: 64(0x40) 2025-08-14T21:21:50.6157141Z Max Clock Freq. (MHz): 2600 2025-08-14T21:21:50.6158116Z BDFID: 0 2025-08-14T21:21:50.6158752Z Internal Node ID: 0 2025-08-14T21:21:50.6159408Z Compute Unit: 32 2025-08-14T21:21:50.6160036Z SIMDs per CU: 0 2025-08-14T21:21:50.6160814Z Shader Engines: 0 2025-08-14T21:21:50.6161495Z Shader Arrs. per Eng.: 0 2025-08-14T21:21:50.6162206Z WatchPts on Addr. Ranges:1 2025-08-14T21:21:50.6162826Z Memory Properties: 2025-08-14T21:21:50.6163294Z Features: None 2025-08-14T21:21:50.6163754Z Pool Info: 2025-08-14T21:21:50.6164188Z Pool 1 2025-08-14T21:21:50.6164744Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-08-14T21:21:50.6165419Z Size: 65790668(0x3ebe2cc) KB 2025-08-14T21:21:50.6166079Z Allocatable: TRUE 2025-08-14T21:21:50.6166751Z Alloc Granule: 4KB 2025-08-14T21:21:50.6167461Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6168174Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6168872Z Accessible by all: TRUE 2025-08-14T21:21:50.6169472Z Pool 2 2025-08-14T21:21:50.6170315Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-08-14T21:21:50.6170976Z Size: 65790668(0x3ebe2cc) KB 2025-08-14T21:21:50.6171611Z Allocatable: TRUE 2025-08-14T21:21:50.6172271Z Alloc Granule: 4KB 2025-08-14T21:21:50.6172972Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6173688Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6174370Z Accessible by all: TRUE 2025-08-14T21:21:50.6174964Z Pool 3 2025-08-14T21:21:50.6175501Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-08-14T21:21:50.6176140Z Size: 65790668(0x3ebe2cc) KB 2025-08-14T21:21:50.6176772Z Allocatable: TRUE 2025-08-14T21:21:50.6177451Z Alloc Granule: 4KB 2025-08-14T21:21:50.6178141Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6178847Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6179525Z Accessible by all: TRUE 2025-08-14T21:21:50.6180118Z Pool 4 2025-08-14T21:21:50.6180660Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:21:50.6181290Z Size: 65790668(0x3ebe2cc) KB 2025-08-14T21:21:50.6181928Z Allocatable: TRUE 2025-08-14T21:21:50.6182598Z Alloc Granule: 4KB 2025-08-14T21:21:50.6183295Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6184002Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6184696Z Accessible by all: TRUE 2025-08-14T21:21:50.6185290Z ISA Info: 2025-08-14T21:21:50.6185725Z ******* 2025-08-14T21:21:50.6186149Z Agent 2 2025-08-14T21:21:50.6186557Z ******* 2025-08-14T21:21:50.6187041Z Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:21:50.6187668Z Uuid: CPU-XX 2025-08-14T21:21:50.6188664Z Marketing Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:21:50.6189357Z Vendor Name: CPU 2025-08-14T21:21:50.6190006Z Feature: None specified 2025-08-14T21:21:50.6190655Z Profile: FULL_PROFILE 2025-08-14T21:21:50.6191314Z Float Round Mode: NEAR 2025-08-14T21:21:50.6191983Z Max Queue Number: 0(0x0) 2025-08-14T21:21:50.6192645Z Queue Min Size: 0(0x0) 2025-08-14T21:21:50.6193284Z Queue Max Size: 0(0x0) 2025-08-14T21:21:50.6193932Z Queue Type: MULTI 2025-08-14T21:21:50.6194548Z Node: 1 2025-08-14T21:21:50.6195246Z Device Type: CPU 2025-08-14T21:21:50.6195848Z Cache Info: 2025-08-14T21:21:50.6196341Z L1: 32768(0x8000) KB 2025-08-14T21:21:50.6196924Z Chip ID: 0(0x0) 2025-08-14T21:21:50.6197549Z ASIC Revision: 0(0x0) 2025-08-14T21:21:50.6198222Z Cacheline Size: 64(0x40) 2025-08-14T21:21:50.6198883Z Max Clock Freq. (MHz): 2600 2025-08-14T21:21:50.6199867Z BDFID: 0 2025-08-14T21:21:50.6200597Z Internal Node ID: 1 2025-08-14T21:21:50.6201241Z Compute Unit: 32 2025-08-14T21:21:50.6201893Z SIMDs per CU: 0 2025-08-14T21:21:50.6202531Z Shader Engines: 0 2025-08-14T21:21:50.6203214Z Shader Arrs. per Eng.: 0 2025-08-14T21:21:50.6203906Z WatchPts on Addr. Ranges:1 2025-08-14T21:21:50.6204509Z Memory Properties: 2025-08-14T21:21:50.6204964Z Features: None 2025-08-14T21:21:50.6205414Z Pool Info: 2025-08-14T21:21:50.6205843Z Pool 1 2025-08-14T21:21:50.6206389Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-08-14T21:21:50.6207039Z Size: 66046468(0x3efca04) KB 2025-08-14T21:21:50.6207697Z Allocatable: TRUE 2025-08-14T21:21:50.6208366Z Alloc Granule: 4KB 2025-08-14T21:21:50.6209071Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6209788Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6210478Z Accessible by all: TRUE 2025-08-14T21:21:50.6211080Z Pool 2 2025-08-14T21:21:50.6211631Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-08-14T21:21:50.6212294Z Size: 66046468(0x3efca04) KB 2025-08-14T21:21:50.6212936Z Allocatable: TRUE 2025-08-14T21:21:50.6213609Z Alloc Granule: 4KB 2025-08-14T21:21:50.6214306Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6215015Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6215710Z Accessible by all: TRUE 2025-08-14T21:21:50.6216305Z Pool 3 2025-08-14T21:21:50.6216841Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-08-14T21:21:50.6217475Z Size: 66046468(0x3efca04) KB 2025-08-14T21:21:50.6218468Z Allocatable: TRUE 2025-08-14T21:21:50.6219137Z Alloc Granule: 4KB 2025-08-14T21:21:50.6219823Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6220526Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6221216Z Accessible by all: TRUE 2025-08-14T21:21:50.6221803Z Pool 4 2025-08-14T21:21:50.6222344Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:21:50.6222978Z Size: 66046468(0x3efca04) KB 2025-08-14T21:21:50.6223605Z Allocatable: TRUE 2025-08-14T21:21:50.6224269Z Alloc Granule: 4KB 2025-08-14T21:21:50.6224965Z Alloc Recommended Granule:4KB 2025-08-14T21:21:50.6225687Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6226376Z Accessible by all: TRUE 2025-08-14T21:21:50.6226960Z ISA Info: 2025-08-14T21:21:50.6227395Z ******* 2025-08-14T21:21:50.6227806Z Agent 3 2025-08-14T21:21:50.6228212Z ******* 2025-08-14T21:21:50.6228684Z Name: gfx90a 2025-08-14T21:21:50.6229314Z Uuid: GPU-57cd99ed4cd4643e 2025-08-14T21:21:50.6230280Z Marketing Name: AMD Instinct MI210 2025-08-14T21:21:50.6230972Z Vendor Name: AMD 2025-08-14T21:21:50.6231624Z Feature: KERNEL_DISPATCH 2025-08-14T21:21:50.6232273Z Profile: BASE_PROFILE 2025-08-14T21:21:50.6232932Z Float Round Mode: NEAR 2025-08-14T21:21:50.6233609Z Max Queue Number: 128(0x80) 2025-08-14T21:21:50.6234254Z Queue Min Size: 64(0x40) 2025-08-14T21:21:50.6234894Z Queue Max Size: 131072(0x20000) 2025-08-14T21:21:50.6235540Z Queue Type: MULTI 2025-08-14T21:21:50.6236148Z Node: 2 2025-08-14T21:21:50.6236763Z Device Type: GPU 2025-08-14T21:21:50.6237353Z Cache Info: 2025-08-14T21:21:50.6237830Z L1: 16(0x10) KB 2025-08-14T21:21:50.6238398Z L2: 8192(0x2000) KB 2025-08-14T21:21:50.6238983Z Chip ID: 29711(0x740f) 2025-08-14T21:21:50.6239624Z ASIC Revision: 1(0x1) 2025-08-14T21:21:50.6240302Z Cacheline Size: 64(0x40) 2025-08-14T21:21:50.6241092Z Max Clock Freq. (MHz): 1700 2025-08-14T21:21:50.6241706Z BDFID: 768 2025-08-14T21:21:50.6242335Z Internal Node ID: 2 2025-08-14T21:21:50.6242987Z Compute Unit: 104 2025-08-14T21:21:50.6243623Z SIMDs per CU: 4 2025-08-14T21:21:50.6244282Z Shader Engines: 8 2025-08-14T21:21:50.6244954Z Shader Arrs. per Eng.: 1 2025-08-14T21:21:50.6245646Z WatchPts on Addr. Ranges:4 2025-08-14T21:21:50.6246343Z Coherent Host Access: FALSE 2025-08-14T21:21:50.6246964Z Memory Properties: 2025-08-14T21:21:50.6247462Z Features: KERNEL_DISPATCH 2025-08-14T21:21:50.6248413Z Fast F16 Operation: TRUE 2025-08-14T21:21:50.6249095Z Wavefront Size: 64(0x40) 2025-08-14T21:21:50.6249775Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:21:50.6250395Z Workgroup Max Size per Dimension: 2025-08-14T21:21:50.6250934Z x 1024(0x400) 2025-08-14T21:21:50.6251484Z y 1024(0x400) 2025-08-14T21:21:50.6252028Z z 1024(0x400) 2025-08-14T21:21:50.6252629Z Max Waves Per CU: 32(0x20) 2025-08-14T21:21:50.6253310Z Max Work-item Per CU: 2048(0x800) 2025-08-14T21:21:50.6253974Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:21:50.6254569Z Grid Max Size per Dimension: 2025-08-14T21:21:50.6255055Z x 4294967295(0xffffffff) 2025-08-14T21:21:50.6255621Z y 4294967295(0xffffffff) 2025-08-14T21:21:50.6256168Z z 4294967295(0xffffffff) 2025-08-14T21:21:50.6256800Z Max fbarriers/Workgrp: 32 2025-08-14T21:21:50.6257562Z Packet Processor uCode:: 83 2025-08-14T21:21:50.6258260Z SDMA engine uCode:: 8 2025-08-14T21:21:50.6258919Z IOMMU Support:: None 2025-08-14T21:21:50.6259792Z Pool Info: 2025-08-14T21:21:50.6260241Z Pool 1 2025-08-14T21:21:50.6260793Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:21:50.6261449Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:21:50.6262107Z Allocatable: TRUE 2025-08-14T21:21:50.6262792Z Alloc Granule: 4KB 2025-08-14T21:21:50.6263518Z Alloc Recommended Granule:2048KB 2025-08-14T21:21:50.6264270Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6264959Z Accessible by all: FALSE 2025-08-14T21:21:50.6265551Z Pool 2 2025-08-14T21:21:50.6266095Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-08-14T21:21:50.6266757Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:21:50.6267449Z Allocatable: TRUE 2025-08-14T21:21:50.6268123Z Alloc Granule: 4KB 2025-08-14T21:21:50.6268834Z Alloc Recommended Granule:2048KB 2025-08-14T21:21:50.6269560Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6270258Z Accessible by all: FALSE 2025-08-14T21:21:50.6270853Z Pool 3 2025-08-14T21:21:50.6271372Z Segment: GROUP 2025-08-14T21:21:50.6271981Z Size: 64(0x40) KB 2025-08-14T21:21:50.6272616Z Allocatable: FALSE 2025-08-14T21:21:50.6273301Z Alloc Granule: 0KB 2025-08-14T21:21:50.6274011Z Alloc Recommended Granule:0KB 2025-08-14T21:21:50.6274724Z Alloc Alignment: 0KB 2025-08-14T21:21:50.6275414Z Accessible by all: FALSE 2025-08-14T21:21:50.6276018Z ISA Info: 2025-08-14T21:21:50.6276452Z ISA 1 2025-08-14T21:21:50.6277024Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-08-14T21:21:50.6278100Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-08-14T21:21:50.6278812Z Profiles: HSA_PROFILE_BASE 2025-08-14T21:21:50.6279503Z Default Rounding Mode: NEAR 2025-08-14T21:21:50.6280221Z Default Rounding Mode: NEAR 2025-08-14T21:21:50.6280984Z Fast f16: TRUE 2025-08-14T21:21:50.6281645Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:21:50.6282284Z Workgroup Max Size per Dimension: 2025-08-14T21:21:50.6282852Z x 1024(0x400) 2025-08-14T21:21:50.6283420Z y 1024(0x400) 2025-08-14T21:21:50.6283971Z z 1024(0x400) 2025-08-14T21:21:50.6284570Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:21:50.6285188Z Grid Max Size per Dimension: 2025-08-14T21:21:50.6285684Z x 4294967295(0xffffffff) 2025-08-14T21:21:50.6286240Z y 4294967295(0xffffffff) 2025-08-14T21:21:50.6286790Z z 4294967295(0xffffffff) 2025-08-14T21:21:50.6287409Z FBarrier Max Size: 32 2025-08-14T21:21:50.6287988Z ******* 2025-08-14T21:21:50.6288404Z Agent 4 2025-08-14T21:21:50.6289096Z ******* 2025-08-14T21:21:50.6289571Z Name: gfx90a 2025-08-14T21:21:50.6290197Z Uuid: GPU-6b7c2bd8af06ff6a 2025-08-14T21:21:50.6290863Z Marketing Name: AMD Instinct MI210 2025-08-14T21:21:50.6291530Z Vendor Name: AMD 2025-08-14T21:21:50.6292188Z Feature: KERNEL_DISPATCH 2025-08-14T21:21:50.6292829Z Profile: BASE_PROFILE 2025-08-14T21:21:50.6293481Z Float Round Mode: NEAR 2025-08-14T21:21:50.6294144Z Max Queue Number: 128(0x80) 2025-08-14T21:21:50.6294795Z Queue Min Size: 64(0x40) 2025-08-14T21:21:50.6295425Z Queue Max Size: 131072(0x20000) 2025-08-14T21:21:50.6296075Z Queue Type: MULTI 2025-08-14T21:21:50.6296682Z Node: 3 2025-08-14T21:21:50.6297286Z Device Type: GPU 2025-08-14T21:21:50.6297860Z Cache Info: 2025-08-14T21:21:50.6298333Z L1: 16(0x10) KB 2025-08-14T21:21:50.6298899Z L2: 8192(0x2000) KB 2025-08-14T21:21:50.6299482Z Chip ID: 29711(0x740f) 2025-08-14T21:21:50.6300111Z ASIC Revision: 1(0x1) 2025-08-14T21:21:50.6300765Z Cacheline Size: 64(0x40) 2025-08-14T21:21:50.6301415Z Max Clock Freq. (MHz): 1700 2025-08-14T21:21:50.6302047Z BDFID: 33536 2025-08-14T21:21:50.6302690Z Internal Node ID: 3 2025-08-14T21:21:50.6303366Z Compute Unit: 104 2025-08-14T21:21:50.6304021Z SIMDs per CU: 4 2025-08-14T21:21:50.6304677Z Shader Engines: 8 2025-08-14T21:21:50.6305364Z Shader Arrs. per Eng.: 1 2025-08-14T21:21:50.6306070Z WatchPts on Addr. Ranges:4 2025-08-14T21:21:50.6307091Z Coherent Host Access: FALSE 2025-08-14T21:21:50.6307718Z Memory Properties: 2025-08-14T21:21:50.6308213Z Features: KERNEL_DISPATCH 2025-08-14T21:21:50.6308834Z Fast F16 Operation: TRUE 2025-08-14T21:21:50.6309527Z Wavefront Size: 64(0x40) 2025-08-14T21:21:50.6310202Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:21:50.6310839Z Workgroup Max Size per Dimension: 2025-08-14T21:21:50.6311385Z x 1024(0x400) 2025-08-14T21:21:50.6311946Z y 1024(0x400) 2025-08-14T21:21:50.6312494Z z 1024(0x400) 2025-08-14T21:21:50.6313100Z Max Waves Per CU: 32(0x20) 2025-08-14T21:21:50.6313787Z Max Work-item Per CU: 2048(0x800) 2025-08-14T21:21:50.6314499Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:21:50.6315108Z Grid Max Size per Dimension: 2025-08-14T21:21:50.6315597Z x 4294967295(0xffffffff) 2025-08-14T21:21:50.6316155Z y 4294967295(0xffffffff) 2025-08-14T21:21:50.6316704Z z 4294967295(0xffffffff) 2025-08-14T21:21:50.6317341Z Max fbarriers/Workgrp: 32 2025-08-14T21:21:50.6318340Z Packet Processor uCode:: 83 2025-08-14T21:21:50.6319048Z SDMA engine uCode:: 8 2025-08-14T21:21:50.6319719Z IOMMU Support:: None 2025-08-14T21:21:50.6320312Z Pool Info: 2025-08-14T21:21:50.6320840Z Pool 1 2025-08-14T21:21:50.6321384Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-08-14T21:21:50.6322055Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:21:50.6322690Z Allocatable: TRUE 2025-08-14T21:21:50.6323356Z Alloc Granule: 4KB 2025-08-14T21:21:50.6324054Z Alloc Recommended Granule:2048KB 2025-08-14T21:21:50.6324763Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6325447Z Accessible by all: FALSE 2025-08-14T21:21:50.6326047Z Pool 2 2025-08-14T21:21:50.6326590Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-08-14T21:21:50.6327222Z Size: 67092480(0x3ffc000) KB 2025-08-14T21:21:50.6327855Z Allocatable: TRUE 2025-08-14T21:21:50.6328519Z Alloc Granule: 4KB 2025-08-14T21:21:50.6329222Z Alloc Recommended Granule:2048KB 2025-08-14T21:21:50.6329921Z Alloc Alignment: 4KB 2025-08-14T21:21:50.6330603Z Accessible by all: FALSE 2025-08-14T21:21:50.6331190Z Pool 3 2025-08-14T21:21:50.6331696Z Segment: GROUP 2025-08-14T21:21:50.6332297Z Size: 64(0x40) KB 2025-08-14T21:21:50.6332924Z Allocatable: FALSE 2025-08-14T21:21:50.6333591Z Alloc Granule: 0KB 2025-08-14T21:21:50.6334284Z Alloc Recommended Granule:0KB 2025-08-14T21:21:50.6334980Z Alloc Alignment: 0KB 2025-08-14T21:21:50.6335661Z Accessible by all: FALSE 2025-08-14T21:21:50.6336555Z ISA Info: 2025-08-14T21:21:50.6336974Z ISA 1 2025-08-14T21:21:50.6337523Z Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- 2025-08-14T21:21:50.6338251Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-08-14T21:21:50.6338948Z Profiles: HSA_PROFILE_BASE 2025-08-14T21:21:50.6339639Z Default Rounding Mode: NEAR 2025-08-14T21:21:50.6340348Z Default Rounding Mode: NEAR 2025-08-14T21:21:50.6341021Z Fast f16: TRUE 2025-08-14T21:21:50.6341674Z Workgroup Max Size: 1024(0x400) 2025-08-14T21:21:50.6342302Z Workgroup Max Size per Dimension: 2025-08-14T21:21:50.6342849Z x 1024(0x400) 2025-08-14T21:21:50.6343400Z y 1024(0x400) 2025-08-14T21:21:50.6343954Z z 1024(0x400) 2025-08-14T21:21:50.6344553Z Grid Max Size: 4294967295(0xffffffff) 2025-08-14T21:21:50.6345149Z Grid Max Size per Dimension: 2025-08-14T21:21:50.6345647Z x 4294967295(0xffffffff) 2025-08-14T21:21:50.6346198Z y 4294967295(0xffffffff) 2025-08-14T21:21:50.6346742Z z 4294967295(0xffffffff) 2025-08-14T21:21:50.6347647Z FBarrier Max Size: 32 2025-08-14T21:21:50.6348241Z *** Done *** 2025-08-14T21:21:50.6348665Z + rocminfo 2025-08-14T21:21:50.6349057Z + grep -E 'Name:.*\sgfx|Marketing' 2025-08-14T21:21:50.7641228Z Marketing Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:21:50.7642115Z Marketing Name: AMD EPYC 7513 32-Core Processor 2025-08-14T21:21:50.7642835Z Name: gfx90a 2025-08-14T21:21:50.7643510Z Marketing Name: AMD Instinct MI210 2025-08-14T21:21:50.7644144Z Name: gfx90a 2025-08-14T21:21:50.7644812Z Marketing Name: AMD Instinct MI210 2025-08-14T21:21:50.7782185Z + MAYBE_ROCM=rocm/ 2025-08-14T21:21:50.7782752Z + [[ linux-jammy-rocm-py3.10 == *xpu* ]] 2025-08-14T21:21:50.7783429Z + [[ linux-jammy-rocm-py3.10 != *-bazel-* ]] 2025-08-14T21:21:50.7784060Z + pip_install ninja==1.10.2 2025-08-14T21:21:50.7784708Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-08-14T21:21:50.7785499Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-08-14T21:21:51.1613724Z Collecting ninja==1.10.2 2025-08-14T21:21:51.2084284Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-08-14T21:21:51.2282274Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-08-14T21:21:51.5632715Z Installing collected packages: ninja 2025-08-14T21:21:51.5633426Z Attempting uninstall: ninja 2025-08-14T21:21:51.5638112Z Found existing installation: ninja 1.11.1.3 2025-08-14T21:21:51.5656935Z Uninstalling ninja-1.11.1.3: 2025-08-14T21:21:51.5758614Z Successfully uninstalled ninja-1.11.1.3 2025-08-14T21:21:51.6040156Z Successfully installed ninja-1.10.2 2025-08-14T21:21:51.6489299Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-08-14T21:21:51.6492908Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-08-14T21:21:51.6495428Z + [[ linux-jammy-rocm-py3.10 == *aarch64* ]] 2025-08-14T21:21:51.6496070Z + [[ linux-jammy-rocm-py3.10 == *asan* ]] 2025-08-14T21:21:51.6496679Z + [[ linux-jammy-rocm-py3.10 == *-debug* ]] 2025-08-14T21:21:51.6497283Z + [[ linux-jammy-rocm-py3.10 != *-bazel-* ]] 2025-08-14T21:21:51.6498142Z + echo 'We are not in debug mode: linux-jammy-rocm-py3.10. Expect the assertion to pass' 2025-08-14T21:21:51.6499194Z We are not in debug mode: linux-jammy-rocm-py3.10. Expect the assertion to pass 2025-08-14T21:21:51.6499972Z + cd test 2025-08-14T21:21:51.6500584Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-08-14T21:21:52.9574507Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-08-14T21:21:52.9574874Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-08-14T21:21:52.9575180Z + [[ default == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-08-14T21:21:52.9577063Z + DYNAMO_BENCHMARK_FLAGS=() 2025-08-14T21:21:52.9577306Z + [[ default == *pr_time_benchmarks* ]] 2025-08-14T21:21:52.9577619Z + [[ default == *dynamo_eager* ]] 2025-08-14T21:21:52.9577854Z + [[ default == *aot_eager* ]] 2025-08-14T21:21:52.9578080Z + [[ default == *aot_inductor* ]] 2025-08-14T21:21:52.9578328Z + [[ default == *max_autotune_inductor* ]] 2025-08-14T21:21:52.9578585Z + [[ default == *inductor* ]] 2025-08-14T21:21:52.9578903Z + [[ default == *dynamic* ]] 2025-08-14T21:21:52.9579118Z + [[ default == *cpu* ]] 2025-08-14T21:21:52.9579360Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-08-14T21:21:52.9595363Z + [[ linux-jammy-rocm-py3.10 == *libtorch* ]] 2025-08-14T21:21:52.9596268Z + [[ linux-jammy-rocm-py3.10 == *-bazel-* ]] 2025-08-14T21:21:52.9598855Z + cd test 2025-08-14T21:21:52.9599093Z + python -c 'import torch; print(torch.__config__.show())' 2025-08-14T21:21:54.1065189Z PyTorch built with: 2025-08-14T21:21:54.1065774Z - GCC 11.4 2025-08-14T21:21:54.1066190Z - C++ Version: 201703 2025-08-14T21:21:54.1067181Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-08-14T21:21:54.1068499Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-08-14T21:21:54.1069249Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-08-14T21:21:54.1070244Z - LAPACK is enabled (usually provided by MKL) 2025-08-14T21:21:54.1070806Z - NNPACK is enabled 2025-08-14T21:21:54.1071247Z - CPU capability usage: AVX2 2025-08-14T21:21:54.1071730Z - HIP Runtime 6.4.43484 2025-08-14T21:21:54.1072158Z - MIOpen 3.4.0 2025-08-14T21:21:54.1072548Z - Magma 2.7.2 2025-08-14T21:21:54.1080051Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=1fc683cf17c8c673044538d10266c00f92987be2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.9.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-08-14T21:21:54.1088228Z 2025-08-14T21:21:54.3500602Z + cd test 2025-08-14T21:21:54.3501258Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-08-14T21:21:55.4061654Z ATen/Parallel: 2025-08-14T21:21:55.4062271Z at::get_num_threads() : 64 2025-08-14T21:21:55.4062848Z at::get_num_interop_threads() : 64 2025-08-14T21:21:55.4064557Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-08-14T21:21:55.4065077Z omp_get_max_threads() : 64 2025-08-14T21:21:55.4066043Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-08-14T21:21:55.4067039Z mkl_get_max_threads() : 64 2025-08-14T21:21:55.4067695Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-08-14T21:21:55.4068455Z std::thread::hardware_concurrency() : 64 2025-08-14T21:21:55.4068998Z Environment variables: 2025-08-14T21:21:55.4069463Z OMP_NUM_THREADS : [not set] 2025-08-14T21:21:55.4069928Z MKL_NUM_THREADS : [not set] 2025-08-14T21:21:55.4070397Z ATen parallel backend: OpenMP 2025-08-14T21:21:55.4070716Z 2025-08-14T21:21:55.6273483Z + [[ default == *numpy_2* ]] 2025-08-14T21:21:55.6278020Z + [[ linux-jammy-rocm-py3.10 == *aarch64* ]] 2025-08-14T21:21:55.6278892Z + [[ default == *backward* ]] 2025-08-14T21:21:55.6279386Z + [[ default == *xla* ]] 2025-08-14T21:21:55.6279825Z + [[ default == *executorch* ]] 2025-08-14T21:21:55.6280273Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]] 2025-08-14T21:21:55.6280934Z + [[ linux-jammy-rocm-py3.10 == *libtorch* ]] 2025-08-14T21:21:55.6281430Z + [[ default == distributed ]] 2025-08-14T21:21:55.6281854Z + [[ default == *operator_benchmark* ]] 2025-08-14T21:21:55.6282304Z + [[ default == *inductor_distributed* ]] 2025-08-14T21:21:55.6282795Z + [[ default == *inductor-halide* ]] 2025-08-14T21:21:55.6283253Z + [[ default == *inductor-triton-cpu* ]] 2025-08-14T21:21:55.6283669Z + [[ default == *inductor-micro-benchmark* ]] 2025-08-14T21:21:55.6284206Z + [[ default == *huggingface* ]] 2025-08-14T21:21:55.6284510Z + [[ default == *timm* ]] 2025-08-14T21:21:55.6284717Z + [[ default == cachebench ]] 2025-08-14T21:21:55.6284925Z + [[ default == verify_cachebench ]] 2025-08-14T21:21:55.6285143Z + [[ default == *torchbench* ]] 2025-08-14T21:21:55.6285356Z + [[ default == *inductor_cpp_wrapper* ]] 2025-08-14T21:21:55.6285579Z + [[ default == *inductor* ]] 2025-08-14T21:21:55.6285777Z + [[ default == *einops* ]] 2025-08-14T21:21:55.6285974Z + [[ default == *dynamo_wrapped* ]] 2025-08-14T21:21:55.6286200Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]] 2025-08-14T21:21:55.6286423Z + [[ -n '' ]] 2025-08-14T21:21:55.6286576Z + [[ 2 == 1 ]] 2025-08-14T21:21:55.6286728Z + [[ 2 == 2 ]] 2025-08-14T21:21:55.6286886Z + [[ 6 -gt 1 ]] 2025-08-14T21:21:55.6287057Z + install_torchvision 2025-08-14T21:21:55.6287244Z + local orig_preload 2025-08-14T21:21:55.6287413Z + local commit 2025-08-14T21:21:55.6287587Z ++ get_pinned_commit vision 2025-08-14T21:21:55.6287803Z ++ cat .github/ci_commit_pins/vision.txt 2025-08-14T21:21:55.6297479Z + commit=966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-08-14T21:21:55.6297824Z + orig_preload= 2025-08-14T21:21:55.6298030Z + '[' -n '' ']' 2025-08-14T21:21:55.6298247Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]] 2025-08-14T21:21:55.6298790Z + pip_build_and_install git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 dist/vision 2025-08-14T21:21:55.6299530Z + local build_target=git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-08-14T21:21:55.6299975Z + local wheel_dir=dist/vision 2025-08-14T21:21:55.6300189Z + local found_whl=0 2025-08-14T21:21:55.6300388Z + for file in "${wheel_dir}"/*.whl 2025-08-14T21:21:55.6300623Z + [[ -f dist/vision/*.whl ]] 2025-08-14T21:21:55.6300831Z + '[' 0 == 0 ']' 2025-08-14T21:21:55.6301424Z + python3 -m pip wheel --no-build-isolation --no-deps --no-use-pep517 -w dist/vision git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-08-14T21:21:55.9230893Z Collecting git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-08-14T21:21:55.9239224Z Cloning https://github.com/pytorch/vision.git (to revision 966da7e46f65d6d49df3e31214470a4fe5cc8e66) to /tmp/pip-req-build-_m3uveop 2025-08-14T21:21:55.9272560Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-_m3uveop 2025-08-14T21:21:59.0394318Z Running command git rev-parse -q --verify 'sha^966da7e46f65d6d49df3e31214470a4fe5cc8e66' 2025-08-14T21:21:59.0437068Z Running command git fetch -q https://github.com/pytorch/vision.git 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-08-14T21:21:59.4916505Z Running command git checkout -q 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-08-14T21:22:00.0667853Z Resolved https://github.com/pytorch/vision.git to commit 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-08-14T21:22:02.4504268Z Preparing metadata (setup.py) ... [?25l- \ | / done 2025-08-14T21:22:02.4552642Z [?25hBuilding wheels for collected packages: torchvision 2025-08-14T21:22:02.4677525Z  DEPRECATION: Building 'torchvision' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'torchvision'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-08-14T21:22:53.5716199Z  Building wheel for torchvision (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-08-14T21:22:53.5746863Z [?25h Created wheel for torchvision: filename=torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl size=1570530 sha256=9cab323b2e7c468beb960fdfe8155174742a73822a646788448a34011db8c2b5 2025-08-14T21:22:53.5752003Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/9c/9d/3e/42fa2d5ac6ba44a90363f8fff0fa9e712e24d4f977637c81cb 2025-08-14T21:22:53.5814753Z Successfully built torchvision 2025-08-14T21:22:53.7095190Z + for file in "${wheel_dir}"/*.whl 2025-08-14T21:22:53.7096265Z + pip_install_whl dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-08-14T21:22:53.7097510Z + args=('dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl') 2025-08-14T21:22:53.7098329Z + local args 2025-08-14T21:22:53.7099056Z + [[ dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-08-14T21:22:53.7099911Z + for path in "${args[@]}" 2025-08-14T21:22:53.7100679Z + echo 'Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl' 2025-08-14T21:22:53.7101772Z Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-08-14T21:22:53.7103035Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-08-14T21:22:54.0229523Z Processing ./dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-08-14T21:22:54.0308472Z Installing collected packages: torchvision 2025-08-14T21:22:54.3930372Z Successfully installed torchvision-0.22.0a0+966da7e 2025-08-14T21:22:54.4242516Z + '[' -n '' ']' 2025-08-14T21:22:54.4242884Z + test_python_shard 2 2025-08-14T21:22:54.4243123Z + [[ -z 6 ]] 2025-08-14T21:22:54.4243741Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard 2 6 --verbose --upload-artifacts-while-running 2025-08-14T21:22:56.9233912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T21:22:56.9236576Z import pkg_resources 2025-08-14T21:22:57.4417210Z Excluding test_cuda_nvml_based_avail on ROCm 2025-08-14T21:22:58.0543014Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-08-14T21:22:58.4907168Z Ignoring disabled issues: [''] 2025-08-14T21:22:58.5147388Z Found test times from artifacts 2025-08-14T21:22:58.5905251Z Found test times from artifacts 2025-08-14T21:22:58.5915186Z Running all tests 2025-08-14T21:22:58.6345714Z Running parallel tests on 2 processes 2025-08-14T21:22:58.6350150Z Name: tests to run (est. time: 149.39min) 2025-08-14T21:22:58.6350835Z Serial tests (0): 2025-08-14T21:22:58.6351294Z Parallel tests (70): 2025-08-14T21:22:58.6351805Z test_public_bindings 1/1 2025-08-14T21:22:58.6352312Z inductor/test_aot_inductor 5/5 2025-08-14T21:22:58.6352895Z inductor/test_torchinductor 1/2 2025-08-14T21:22:58.6353496Z inductor/test_torchinductor_dynamic_shapes 5/8 2025-08-14T21:22:58.6354263Z inductor/test_torchinductor_codegen_dynamic_shapes 1/6 2025-08-14T21:22:58.6354955Z inductor/test_torchinductor_opinfo 1/18 2025-08-14T21:22:58.6355550Z inductor/test_torchinductor_opinfo 5/18 2025-08-14T21:22:58.6356118Z inductor/test_torchinductor_opinfo 11/18 2025-08-14T21:22:58.6356707Z inductor/test_torchinductor_opinfo 15/18 2025-08-14T21:22:58.6357266Z inductor/test_cpu_repro 2/4 2025-08-14T21:22:58.6357789Z dynamo/test_dynamic_shapes 3/3 2025-08-14T21:22:58.6358348Z inductor/test_mkldnn_pattern_matcher 2/2 2025-08-14T21:22:58.6358946Z inductor/test_compiled_autograd 3/3 2025-08-14T21:22:58.6359498Z inductor/test_control_flow 1/2 2025-08-14T21:22:58.6360032Z inductor/test_halide 1/1 2025-08-14T21:22:58.6360672Z inductor/test_flex_decoding 4/4 2025-08-14T21:22:58.6361241Z inductor/test_torchbind 1/1 2025-08-14T21:22:58.6361732Z export/test_export 1/1 2025-08-14T21:22:58.6362156Z inductor/test_fp8 1/1 2025-08-14T21:22:58.6363080Z inductor/test_debug_trace 1/1 2025-08-14T21:22:58.6363592Z inductor/test_op_dtype_prop 2/2 2025-08-14T21:22:58.6364079Z inductor/test_analysis 1/1 2025-08-14T21:22:58.6364535Z export/test_unflatten 1/1 2025-08-14T21:22:58.6364985Z dynamo/test_interop 1/1 2025-08-14T21:22:58.6365446Z inductor/test_quantization 1/1 2025-08-14T21:22:58.6365947Z inductor/test_gpu_cpp_wrapper 1/2 2025-08-14T21:22:58.6366467Z inductor/test_gpu_cpp_wrapper 2/2 2025-08-14T21:22:58.6366947Z test_quantization 5/9 2025-08-14T21:22:58.6367369Z test_quantization 8/9 2025-08-14T21:22:58.6367816Z inductor/test_aot_inductor_utils 1/1 2025-08-14T21:22:58.6368305Z test_ops 2/9 2025-08-14T21:22:58.6368669Z test_ops 5/9 2025-08-14T21:22:58.6369047Z test_nestedtensor 2/3 2025-08-14T21:22:58.6369463Z test_modules 3/3 2025-08-14T21:22:58.6369846Z test_decomp 6/21 2025-08-14T21:22:58.6370218Z test_decomp 10/21 2025-08-14T21:22:58.6370607Z test_decomp 18/21 2025-08-14T21:22:58.6371016Z test_expanded_weights 1/1 2025-08-14T21:22:58.6371461Z test_ops_gradients 1/4 2025-08-14T21:22:58.6371885Z functorch/test_ops 4/9 2025-08-14T21:22:58.6372295Z functorch/test_ops 8/9 2025-08-14T21:22:58.6372712Z nn/test_packed_sequence 1/1 2025-08-14T21:22:58.6373155Z nn/test_pruning 1/1 2025-08-14T21:22:58.6373558Z optim/test_optim 1/1 2025-08-14T21:22:58.6374004Z profiler/test_execution_trace 1/1 2025-08-14T21:22:58.6374536Z profiler/test_profiler_tree 1/1 2025-08-14T21:22:58.6375035Z profiler/test_python_tracer 1/1 2025-08-14T21:22:58.6375543Z profiler/test_record_function 1/1 2025-08-14T21:22:58.6376034Z profiler/test_torch_tidy 1/1 2025-08-14T21:22:58.6376500Z test_accelerator 1/1 2025-08-14T21:22:58.6376918Z test_ao_sparsity 1/1 2025-08-14T21:22:58.6377361Z test_appending_byte_serializer 1/1 2025-08-14T21:22:58.6377843Z test_autoload 1/1 2025-08-14T21:22:58.6378235Z test_binary_ufuncs 1/1 2025-08-14T21:22:58.6378672Z test_functional_optim 1/1 2025-08-14T21:22:58.6379141Z test_functionalization 1/1 2025-08-14T21:22:58.6379580Z test_futures 1/1 2025-08-14T21:22:58.6379987Z test_fx_experimental 1/1 2025-08-14T21:22:58.6380409Z test_fx_passes 1/1 2025-08-14T21:22:58.6380820Z test_fx_reinplace_pass 1/1 2025-08-14T21:22:58.6381252Z test_hub 1/1 2025-08-14T21:22:58.6381653Z test_legacy_vmap 1/1 2025-08-14T21:22:58.6382429Z test_meta 3/4 2025-08-14T21:22:58.6382832Z test_sparse_csr 3/3 2025-08-14T21:22:58.6383259Z test_stateless 1/1 2025-08-14T21:22:58.6383672Z test_subclass 1/1 2025-08-14T21:22:58.6384085Z test_sympy_utils 1/1 2025-08-14T21:22:58.6384529Z test_tensorexpr_pybind 1/1 2025-08-14T21:22:58.6384998Z test_transformers 1/1 2025-08-14T21:22:58.6385445Z xpu/test_gemm 1/1 2025-08-14T21:22:58.6385878Z Name: excluded (est. time: 0.0min) 2025-08-14T21:22:58.6386361Z Serial tests (0): 2025-08-14T21:22:58.6386767Z Parallel tests (0): 2025-08-14T21:22:58.6387364Z Running test_public_bindings 1/1 ... [2025-08-14 21:22:58.635339] 2025-08-14T21:22:58.6388077Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:22:58.6389841Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_public_bindings.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:22:58.635580] 2025-08-14T21:23:01.7572649Z 2025-08-14T21:23:01.7574591Z test_public_bindings 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_public_bindings_1.1_6e8808ac7036f5d9_.log 2025-08-14T21:23:01.7576232Z Running 0 items in this shard: 2025-08-14T21:23:01.7576608Z 2025-08-14T21:23:01.7576939Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T21:23:01.7577419Z Uploading artifacts took 0.00 seconds 2025-08-14T21:23:01.7580994Z Running inductor/test_aot_inductor 5/5 ... [2025-08-14 21:23:01.757722] 2025-08-14T21:23:01.7582036Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:23:01.7585840Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '-m', 'serial', '--shard-id=5', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:23:01.758044] 2025-08-14T21:23:09.1391923Z 2025-08-14T21:23:09.1393611Z inductor/test_aot_inductor 5/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_5.5_634a2f1aa89193dc_.log 2025-08-14T21:23:09.1395386Z Running 0 items in this shard: 2025-08-14T21:23:09.1395799Z 2025-08-14T21:23:09.1399162Z Running inductor/test_torchinductor 1/2 ... [2025-08-14 21:23:09.139429] 2025-08-14T21:23:09.1400095Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:23:09.1410775Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor.py', '-m', 'serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:23:09.140019] 2025-08-14T21:23:23.3362734Z 2025-08-14T21:23:23.3364635Z inductor/test_torchinductor 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_1.2_eff4fc94fbc4082e_.log 2025-08-14T21:23:23.3366762Z Running 1 items in this shard: test/inductor/test_torchinductor.py::GPUTests::test_large_block_sizes_cuda 2025-08-14T21:23:23.3367558Z 2025-08-14T21:23:23.3368223Z Running inductor/test_torchinductor_dynamic_shapes 5/8 ... [2025-08-14 21:23:23.336357] 2025-08-14T21:23:23.3371770Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:23:23.3373914Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '-m', 'serial', '--shard-id=5', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:23:23.336708] 2025-08-14T21:23:47.5543874Z 2025-08-14T21:23:47.5544955Z inductor/test_torchinductor_dynamic_shapes 5/8 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_5.8_3eb1301c72ed18b1_.log 2025-08-14T21:23:47.5546502Z Running 1 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_block_sizes_dynamic_shapes_cpu 2025-08-14T21:23:47.5547616Z 2025-08-14T21:23:47.5551915Z Running inductor/test_torchinductor_codegen_dynamic_shapes 1/6 ... [2025-08-14 21:23:47.554066] 2025-08-14T21:23:47.5552759Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:23:47.5554375Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_dynamic_shapes.py', '-m', 'serial', '--shard-id=1', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:23:47.554376] 2025-08-14T21:23:54.3339652Z 2025-08-14T21:23:54.3341782Z inductor/test_torchinductor_codegen_dynamic_shapes 1/6 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_1.6_c757b382b11d892a_.log 2025-08-14T21:23:54.3343718Z Running 0 items in this shard: 2025-08-14T21:23:54.3344140Z 2025-08-14T21:23:54.3348388Z Running inductor/test_torchinductor_opinfo 1/18 ... [2025-08-14 21:23:54.334282] 2025-08-14T21:23:54.3349314Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:23:54.3354815Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=1', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:23:54.334860] 2025-08-14T21:24:02.0167268Z 2025-08-14T21:24:02.0174791Z inductor/test_torchinductor_opinfo 1/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_1.18_7959c4847801d62a_.log 2025-08-14T21:24:02.0175515Z Running 0 items in this shard: 2025-08-14T21:24:02.0175664Z 2025-08-14T21:24:02.0175851Z Running inductor/test_torchinductor_opinfo 5/18 ... [2025-08-14 21:24:02.016783] 2025-08-14T21:24:02.0176247Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:02.0177077Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=5', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:02.017244] 2025-08-14T21:24:09.7489267Z 2025-08-14T21:24:09.7491064Z inductor/test_torchinductor_opinfo 5/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_5.18_40ad2f34a6de5a5a_.log 2025-08-14T21:24:09.7492656Z Running 0 items in this shard: 2025-08-14T21:24:09.7492938Z 2025-08-14T21:24:09.7493326Z Running inductor/test_torchinductor_opinfo 11/18 ... [2025-08-14 21:24:09.748776] 2025-08-14T21:24:09.7494091Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:09.7496592Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=11', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:09.749318] 2025-08-14T21:24:17.6810412Z 2025-08-14T21:24:17.6812074Z inductor/test_torchinductor_opinfo 11/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_11.18_a17e0f9a9e2ea502_.log 2025-08-14T21:24:17.6813623Z Running 0 items in this shard: 2025-08-14T21:24:17.6813941Z 2025-08-14T21:24:17.6817362Z Running inductor/test_torchinductor_opinfo 15/18 ... [2025-08-14 21:24:17.681383] 2025-08-14T21:24:17.6818166Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:17.6827426Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'serial', '--shard-id=15', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:17.681930] 2025-08-14T21:24:25.4137744Z 2025-08-14T21:24:25.4138661Z inductor/test_torchinductor_opinfo 15/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_15.18_18e50332a5b25c00_.log 2025-08-14T21:24:25.4139346Z Running 0 items in this shard: 2025-08-14T21:24:25.4139491Z 2025-08-14T21:24:25.4145428Z Running inductor/test_cpu_repro 2/4 ... [2025-08-14 21:24:25.414081] 2025-08-14T21:24:25.4145831Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:25.4155234Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_repro.py', '-m', 'serial', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:25.414665] 2025-08-14T21:24:31.3431269Z 2025-08-14T21:24:31.3433277Z inductor/test_cpu_repro 2/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_2.4_88c2791ba0c40648_.log 2025-08-14T21:24:31.3434738Z Running 0 items in this shard: 2025-08-14T21:24:31.3435075Z 2025-08-14T21:24:31.3439550Z Running dynamo/test_dynamic_shapes 3/3 ... [2025-08-14 21:24:31.343325] 2025-08-14T21:24:31.3440688Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:31.3444123Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dynamic_shapes.py', '-m', 'serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:31.343883] 2025-08-14T21:24:40.7791029Z 2025-08-14T21:24:40.7791859Z dynamo/test_dynamic_shapes 3/3 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dynamic_shapes_3.3_e7177bdd88f3fca0_.log 2025-08-14T21:24:40.7792774Z Running 1 items in this shard: test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_mem_leak_guards_dynamic_shapes 2025-08-14T21:24:40.7793190Z 2025-08-14T21:24:40.7798472Z Running inductor/test_mkldnn_pattern_matcher 2/2 ... [2025-08-14 21:24:40.779389] 2025-08-14T21:24:40.7799384Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:40.7808197Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mkldnn_pattern_matcher.py', '-m', 'serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:40.779936] 2025-08-14T21:24:46.5078951Z 2025-08-14T21:24:46.5081158Z inductor/test_mkldnn_pattern_matcher 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mkldnn_pattern_matcher_2.2_2493c088af37c14f_.log 2025-08-14T21:24:46.5082729Z Running 0 items in this shard: 2025-08-14T21:24:46.5083079Z 2025-08-14T21:24:46.5088911Z Running inductor/test_compiled_autograd 3/3 ... [2025-08-14 21:24:46.508122] 2025-08-14T21:24:46.5089682Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:46.5091095Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '-m', 'serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:46.508421] 2025-08-14T21:24:53.7389960Z 2025-08-14T21:24:53.7391779Z inductor/test_compiled_autograd 3/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_3.3_53a8284361554f2b_.log 2025-08-14T21:24:53.7393385Z Running 0 items in this shard: 2025-08-14T21:24:53.7393807Z 2025-08-14T21:24:53.7396232Z Running inductor/test_control_flow 1/2 ... [2025-08-14 21:24:53.739214] 2025-08-14T21:24:53.7397045Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:53.7410488Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '-m', 'serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:53.739758] 2025-08-14T21:24:59.3176251Z 2025-08-14T21:24:59.3177311Z inductor/test_control_flow 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_1.2_1beb5fa4bd0477d3_.log 2025-08-14T21:24:59.3178090Z Running 0 items in this shard: 2025-08-14T21:24:59.3178270Z 2025-08-14T21:24:59.3182377Z Running inductor/test_halide 1/1 ... [2025-08-14 21:24:59.317864] 2025-08-14T21:24:59.3182793Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:24:59.3191089Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_halide.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:24:59.318452] 2025-08-14T21:25:05.0155018Z 2025-08-14T21:25:05.0156052Z inductor/test_halide 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_halide_1.1_a3a8fa086e23fdfc_.log 2025-08-14T21:25:05.0156749Z 2025-08-14T21:25:05.0160646Z Running inductor/test_flex_decoding 4/4 ... [2025-08-14 21:25:05.015704] 2025-08-14T21:25:05.0161095Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:05.0176535Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_decoding.py', '-m', 'serial', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:05.016255] 2025-08-14T21:25:15.5043212Z 2025-08-14T21:25:15.5044859Z inductor/test_flex_decoding 4/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_decoding_4.4_5bbb8f279ea28441_.log 2025-08-14T21:25:15.5047076Z Running 1 items in this shard: test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_17_float16_cuda_float16 2025-08-14T21:25:15.5048175Z 2025-08-14T21:25:15.5048687Z Running inductor/test_torchbind 1/1 ... [2025-08-14 21:25:15.504493] 2025-08-14T21:25:15.5049597Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:15.5056710Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchbind.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:15.505122] 2025-08-14T21:25:21.0827093Z 2025-08-14T21:25:21.0828881Z inductor/test_torchbind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchbind_1.1_f4746be1d476a2bc_.log 2025-08-14T21:25:21.0830577Z Running 0 items in this shard: 2025-08-14T21:25:21.0830969Z 2025-08-14T21:25:21.0834636Z Running export/test_export 1/1 ... [2025-08-14 21:25:21.082908] 2025-08-14T21:25:21.0835382Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:21.0840521Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_export.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:21.083460] 2025-08-14T21:25:26.9117148Z 2025-08-14T21:25:26.9118732Z export/test_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_1.1_b5e70adccb89c9ef_.log 2025-08-14T21:25:26.9120058Z Running 0 items in this shard: 2025-08-14T21:25:26.9120551Z 2025-08-14T21:25:26.9125407Z Running inductor/test_fp8 1/1 ... [2025-08-14 21:25:26.911998] 2025-08-14T21:25:26.9126209Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:26.9132300Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:26.912570] 2025-08-14T21:25:32.4902254Z 2025-08-14T21:25:32.4904002Z inductor/test_fp8 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fp8_1.1_9d07587b1ce2f21a_.log 2025-08-14T21:25:32.4905112Z Running 0 items in this shard: 2025-08-14T21:25:32.4905409Z 2025-08-14T21:25:32.4909442Z Running inductor/test_debug_trace 1/1 ... [2025-08-14 21:25:32.490484] 2025-08-14T21:25:32.4910283Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:32.4916481Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:32.491036] 2025-08-14T21:25:38.3184485Z 2025-08-14T21:25:38.3185428Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_d328e0053087c33b_.log 2025-08-14T21:25:38.3186135Z Running 0 items in this shard: 2025-08-14T21:25:38.3186286Z 2025-08-14T21:25:38.3187424Z Running inductor/test_op_dtype_prop 2/2 ... [2025-08-14 21:25:38.318500] 2025-08-14T21:25:38.3187761Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:38.3190095Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_dtype_prop.py', '-m', 'serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:38.318756] 2025-08-14T21:25:44.6470185Z 2025-08-14T21:25:44.6471728Z inductor/test_op_dtype_prop 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_dtype_prop_2.2_f6c94a5d114b755b_.log 2025-08-14T21:25:44.6472921Z Running 0 items in this shard: 2025-08-14T21:25:44.6473205Z 2025-08-14T21:25:44.6476901Z Running inductor/test_analysis 1/1 ... [2025-08-14 21:25:44.647226] 2025-08-14T21:25:44.6477777Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:44.6485775Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_analysis.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:44.647758] 2025-08-14T21:25:50.4759954Z 2025-08-14T21:25:50.4761588Z inductor/test_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_analysis_1.1_57a3707d225cd1a1_.log 2025-08-14T21:25:50.4763037Z Running 0 items in this shard: 2025-08-14T21:25:50.4763375Z 2025-08-14T21:25:50.4764634Z Running export/test_unflatten 1/1 ... [2025-08-14 21:25:50.475966] 2025-08-14T21:25:50.4765371Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:50.4773914Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:50.476523] 2025-08-14T21:25:53.6990154Z 2025-08-14T21:25:53.6991826Z export/test_unflatten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_1.1_20dd72b9678da674_.log 2025-08-14T21:25:53.6997617Z Running 0 items in this shard: 2025-08-14T21:25:53.6998043Z 2025-08-14T21:25:53.6999044Z Running dynamo/test_interop 1/1 ... [2025-08-14 21:25:53.699160] 2025-08-14T21:25:53.6999914Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:53.7002193Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_interop.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:53.699671] 2025-08-14T21:25:57.0725901Z 2025-08-14T21:25:57.0727369Z dynamo/test_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_interop_1.1_1ef2722872c4c179_.log 2025-08-14T21:25:57.0729865Z Running 0 items in this shard: 2025-08-14T21:25:57.0730259Z 2025-08-14T21:25:57.0731323Z Running inductor/test_quantization 1/1 ... [2025-08-14 21:25:57.072709] 2025-08-14T21:25:57.0732241Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:25:57.0739865Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_quantization.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:25:57.073326] 2025-08-14T21:26:02.6506515Z 2025-08-14T21:26:02.6508207Z inductor/test_quantization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_quantization_1.1_9210e020a6d2a853_.log 2025-08-14T21:26:02.6509663Z Running 0 items in this shard: 2025-08-14T21:26:02.6509993Z 2025-08-14T21:26:02.6512474Z Running inductor/test_gpu_cpp_wrapper 1/2 ... [2025-08-14 21:26:02.650725] 2025-08-14T21:26:02.6513278Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:02.6519741Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '-m', 'serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:02.651276] 2025-08-14T21:26:09.0303564Z 2025-08-14T21:26:09.0305987Z inductor/test_gpu_cpp_wrapper 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_1.2_a8127b90ace399e3_.log 2025-08-14T21:26:09.0307496Z Running 0 items in this shard: 2025-08-14T21:26:09.0307897Z 2025-08-14T21:26:09.0308387Z Running inductor/test_gpu_cpp_wrapper 2/2 ... [2025-08-14 21:26:09.030192] 2025-08-14T21:26:09.0309218Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:09.0311121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '-m', 'serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:09.030507] 2025-08-14T21:26:15.4093886Z 2025-08-14T21:26:15.4096107Z inductor/test_gpu_cpp_wrapper 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_2.2_598396853da68b7b_.log 2025-08-14T21:26:15.4097895Z Running 0 items in this shard: 2025-08-14T21:26:15.4098300Z 2025-08-14T21:26:15.4101475Z Running test_quantization 5/9 ... [2025-08-14 21:26:15.409620] 2025-08-14T21:26:15.4102318Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:15.4106179Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=5', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:15.410076] 2025-08-14T21:26:20.4864483Z 2025-08-14T21:26:20.4866391Z test_quantization 5/9 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_5.9_db1f824c6efd3e80_.log 2025-08-14T21:26:20.4867722Z Running 0 items in this shard: 2025-08-14T21:26:20.4868060Z 2025-08-14T21:26:20.4870124Z Running test_quantization 8/9 ... [2025-08-14 21:26:20.486621] 2025-08-14T21:26:20.4870838Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:20.4878048Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'serial', '--shard-id=8', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:20.487151] 2025-08-14T21:26:25.3133245Z 2025-08-14T21:26:25.3134881Z test_quantization 8/9 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_8.9_05fac72837ed503f_.log 2025-08-14T21:26:25.3137235Z Running 0 items in this shard: 2025-08-14T21:26:25.3137574Z 2025-08-14T21:26:25.3138048Z Running inductor/test_aot_inductor_utils 1/1 ... [2025-08-14 21:26:25.313463] 2025-08-14T21:26:25.3139007Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:25.3145135Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:25.314018] 2025-08-14T21:26:30.9614329Z 2025-08-14T21:26:30.9615363Z inductor/test_aot_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_utils_1.1_da0d9c9626c58a12_.log 2025-08-14T21:26:30.9616177Z Running 0 items in this shard: 2025-08-14T21:26:30.9616348Z 2025-08-14T21:26:30.9625125Z Running test_ops 2/9 ... [2025-08-14 21:26:30.961764] 2025-08-14T21:26:30.9625873Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:30.9628153Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '-m', 'serial', '--shard-id=2', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:30.962359] 2025-08-14T21:26:43.8055891Z 2025-08-14T21:26:43.8057622Z test_ops 2/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_2.9_7691d57118afbbfa_.log 2025-08-14T21:26:43.8059856Z Running 0 items in this shard: 2025-08-14T21:26:43.8060218Z 2025-08-14T21:26:43.8062006Z Running test_ops 5/9 ... [2025-08-14 21:26:43.805891] 2025-08-14T21:26:43.8062665Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:43.8070887Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '-m', 'serial', '--shard-id=5', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:43.806423] 2025-08-14T21:26:56.4992587Z 2025-08-14T21:26:56.4994075Z test_ops 5/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_5.9_f46a477e09fb119f_.log 2025-08-14T21:26:56.4995276Z Running 0 items in this shard: 2025-08-14T21:26:56.4995616Z 2025-08-14T21:26:56.5000703Z Running test_nestedtensor 2/3 ... [2025-08-14 21:26:56.499534] 2025-08-14T21:26:56.5001526Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:26:56.5005508Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '-m', 'serial', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:26:56.500083] 2025-08-14T21:27:01.6262290Z 2025-08-14T21:27:01.6263092Z test_nestedtensor 2/3 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_2.3_3e8b73372ce2b155_.log 2025-08-14T21:27:01.6263697Z Running 0 items in this shard: 2025-08-14T21:27:01.6263835Z 2025-08-14T21:27:01.6269745Z Running test_modules 3/3 ... [2025-08-14 21:27:01.626484] 2025-08-14T21:27:01.6270547Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:01.6276369Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '-m', 'serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:01.627047] 2025-08-14T21:27:06.7534380Z 2025-08-14T21:27:06.7535883Z test_modules 3/3 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_3.3_1d2f3a05d79fe579_.log 2025-08-14T21:27:06.7537133Z Running 0 items in this shard: 2025-08-14T21:27:06.7537471Z 2025-08-14T21:27:06.7539764Z Running test_decomp 6/21 ... [2025-08-14 21:27:06.753582] 2025-08-14T21:27:06.7540540Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:06.7544023Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=6', '--num-shards=21', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:06.753911] 2025-08-14T21:27:12.8819756Z 2025-08-14T21:27:12.8825015Z test_decomp 6/21 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_6.21_f2f8df943d23bf5b_.log 2025-08-14T21:27:12.8826257Z Running 0 items in this shard: 2025-08-14T21:27:12.8826618Z 2025-08-14T21:27:12.8827079Z Running test_decomp 10/21 ... [2025-08-14 21:27:12.881944] 2025-08-14T21:27:12.8827898Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:12.8829893Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=10', '--num-shards=21', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:12.882316] 2025-08-14T21:27:19.0101167Z 2025-08-14T21:27:19.0101997Z test_decomp 10/21 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_10.21_4b86adf79e3ed55a_.log 2025-08-14T21:27:19.0102731Z Running 0 items in this shard: 2025-08-14T21:27:19.0102917Z 2025-08-14T21:27:19.0104504Z Running test_decomp 18/21 ... [2025-08-14 21:27:19.010187] 2025-08-14T21:27:19.0105141Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:19.0108180Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'serial', '--shard-id=18', '--num-shards=21', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:19.010489] 2025-08-14T21:27:25.0879782Z 2025-08-14T21:27:25.0881206Z test_decomp 18/21 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_18.21_a8318322ff8055bf_.log 2025-08-14T21:27:25.0882188Z Running 0 items in this shard: 2025-08-14T21:27:25.0882528Z 2025-08-14T21:27:25.0887694Z Running test_expanded_weights 1/1 ... [2025-08-14 21:27:25.088014] 2025-08-14T21:27:25.0888428Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:25.0890162Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_expanded_weights.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:25.088454] 2025-08-14T21:27:29.6631824Z 2025-08-14T21:27:29.6633828Z test_expanded_weights 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_expanded_weights_1.1_bb202b8fd3eb8574_.log 2025-08-14T21:27:29.6634925Z Running 0 items in this shard: 2025-08-14T21:27:29.6635190Z 2025-08-14T21:27:29.6635453Z Running test_ops_gradients 1/4 ... [2025-08-14 21:27:29.662989] 2025-08-14T21:27:29.6636020Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:29.6638563Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '-m', 'serial', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:29.663273] 2025-08-14T21:27:35.3900607Z 2025-08-14T21:27:35.3902215Z test_ops_gradients 1/4 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_1.4_51afe7da95808700_.log 2025-08-14T21:27:35.3903530Z Running 0 items in this shard: 2025-08-14T21:27:35.3903851Z 2025-08-14T21:27:35.3905592Z Running functorch/test_ops 4/9 ... [2025-08-14 21:27:35.390111] 2025-08-14T21:27:35.3906199Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:35.3912749Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'serial', '--shard-id=4', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:35.390701] 2025-08-14T21:27:42.1697645Z 2025-08-14T21:27:42.1699079Z functorch/test_ops 4/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_4.9_b5158eaf390b6b4b_.log 2025-08-14T21:27:42.1700180Z Running 0 items in this shard: 2025-08-14T21:27:42.1700472Z 2025-08-14T21:27:42.1701211Z Running functorch/test_ops 8/9 ... [2025-08-14 21:27:42.169788] 2025-08-14T21:27:42.1702111Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:42.1706351Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'serial', '--shard-id=8', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:42.170096] 2025-08-14T21:27:48.7984013Z 2025-08-14T21:27:48.7984989Z functorch/test_ops 8/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_8.9_4703bac83c39f747_.log 2025-08-14T21:27:48.7985858Z Running 0 items in this shard: 2025-08-14T21:27:48.7986062Z 2025-08-14T21:27:48.7988552Z Running nn/test_packed_sequence 1/1 ... [2025-08-14 21:27:48.798532] 2025-08-14T21:27:48.7989058Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:48.7992771Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_packed_sequence.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:48.798807] 2025-08-14T21:27:51.9207958Z 2025-08-14T21:27:51.9209668Z nn/test_packed_sequence 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_packed_sequence_1.1_f36af284e76ed136_.log 2025-08-14T21:27:51.9211058Z Running 0 items in this shard: 2025-08-14T21:27:51.9211397Z 2025-08-14T21:27:51.9211904Z Running nn/test_pruning 1/1 ... [2025-08-14 21:27:51.920840] 2025-08-14T21:27:51.9212740Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:51.9220785Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pruning.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:51.921268] 2025-08-14T21:27:55.3940025Z 2025-08-14T21:27:55.3941451Z nn/test_pruning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pruning_1.1_0ada8fb4c247bdb9_.log 2025-08-14T21:27:55.3942693Z Running 0 items in this shard: 2025-08-14T21:27:55.3942976Z 2025-08-14T21:27:55.3943334Z Running optim/test_optim 1/1 ... [2025-08-14 21:27:55.393975] 2025-08-14T21:27:55.3943937Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:55.3948267Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_optim.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:55.394290] 2025-08-14T21:27:58.4157753Z 2025-08-14T21:27:58.4159106Z optim/test_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_optim_1.1_3d16a9ac3440a076_.log 2025-08-14T21:27:58.4160052Z 2025-08-14T21:27:58.4161003Z Running profiler/test_execution_trace 1/1 ... [2025-08-14 21:27:58.415761] 2025-08-14T21:27:58.4161675Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:27:58.4165670Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_execution_trace.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:27:58.416089] 2025-08-14T21:28:02.0894339Z 2025-08-14T21:28:02.0896115Z profiler/test_execution_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_execution_trace_1.1_7603093e553458a9_.log 2025-08-14T21:28:02.0898426Z Running 0 items in this shard: 2025-08-14T21:28:02.0898772Z 2025-08-14T21:28:02.0899191Z Running profiler/test_profiler_tree 1/1 ... [2025-08-14 21:28:02.089359] 2025-08-14T21:28:02.0899981Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:02.0901869Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler_tree.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:02.089669] 2025-08-14T21:28:05.2120130Z 2025-08-14T21:28:05.2121959Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_d1b95d20a820e31d_.log 2025-08-14T21:28:05.2123438Z Running 0 items in this shard: 2025-08-14T21:28:05.2123779Z 2025-08-14T21:28:05.2130889Z Running profiler/test_python_tracer 1/1 ... [2025-08-14 21:28:05.212308] 2025-08-14T21:28:05.2131779Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:05.2133728Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_python_tracer.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:05.212871] 2025-08-14T21:28:08.3351994Z 2025-08-14T21:28:08.3354609Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_5421a0cc59c41516_.log 2025-08-14T21:28:08.3356069Z Running 0 items in this shard: 2025-08-14T21:28:08.3356405Z 2025-08-14T21:28:08.3357069Z Running profiler/test_record_function 1/1 ... [2025-08-14 21:28:08.335321] 2025-08-14T21:28:08.3357851Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:08.3363430Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_record_function.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:08.335754] 2025-08-14T21:28:11.4579338Z 2025-08-14T21:28:11.4581099Z profiler/test_record_function 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_record_function_1.1_f9ed0e72ce4feb38_.log 2025-08-14T21:28:11.4582567Z Running 0 items in this shard: 2025-08-14T21:28:11.4582903Z 2025-08-14T21:28:11.4583366Z Running profiler/test_torch_tidy 1/1 ... [2025-08-14 21:28:11.457881] 2025-08-14T21:28:11.4584132Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:11.4587998Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_torch_tidy.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:11.458288] 2025-08-14T21:28:14.5804896Z 2025-08-14T21:28:14.5806576Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_1b93aebed7208479_.log 2025-08-14T21:28:14.5807951Z Running 0 items in this shard: 2025-08-14T21:28:14.5808339Z 2025-08-14T21:28:14.5808726Z Running test_accelerator 1/1 ... [2025-08-14 21:28:14.580602] 2025-08-14T21:28:14.5809563Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:14.5815659Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_accelerator.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:14.580880] 2025-08-14T21:28:17.7026939Z 2025-08-14T21:28:17.7028473Z test_accelerator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_accelerator_1.1_cf2f8d69bc650b1c_.log 2025-08-14T21:28:17.7029740Z Running 0 items in this shard: 2025-08-14T21:28:17.7030873Z 2025-08-14T21:28:17.7032166Z Running test_ao_sparsity 1/1 ... [2025-08-14 21:28:17.702838] 2025-08-14T21:28:17.7032859Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:17.7038551Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ao_sparsity.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:17.703265] 2025-08-14T21:28:21.2769921Z 2025-08-14T21:28:21.2771715Z test_ao_sparsity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ao_sparsity_1.1_0b12c554f67c7d79_.log 2025-08-14T21:28:21.2773191Z Running 0 items in this shard: 2025-08-14T21:28:21.2773582Z 2025-08-14T21:28:21.2776272Z Running test_appending_byte_serializer 1/1 ... [2025-08-14 21:28:21.277159] 2025-08-14T21:28:21.2777193Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:21.2780466Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_appending_byte_serializer.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:21.277666] 2025-08-14T21:28:24.3995853Z 2025-08-14T21:28:24.3997154Z test_appending_byte_serializer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_appending_byte_serializer_1.1_f7bec12a3bd2bed2_.log 2025-08-14T21:28:24.3998382Z Running 0 items in this shard: 2025-08-14T21:28:24.3999219Z 2025-08-14T21:28:24.4002958Z Running test_autoload 1/1 ... [2025-08-14 21:28:24.399777] 2025-08-14T21:28:24.4003692Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:24.4008293Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autoload.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:24.400292] 2025-08-14T21:28:27.5221959Z 2025-08-14T21:28:27.5222798Z test_autoload 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autoload_1.1_ccf9b762b7fd48c8_.log 2025-08-14T21:28:27.5223476Z Running 0 items in this shard: 2025-08-14T21:28:27.5223666Z 2025-08-14T21:28:27.5229040Z Running test_binary_ufuncs 1/1 ... [2025-08-14 21:28:27.522541] 2025-08-14T21:28:27.5229841Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:27.5235591Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_binary_ufuncs.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:27.523130] 2025-08-14T21:28:34.2521003Z 2025-08-14T21:28:34.2522459Z test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_binary_ufuncs_1.1_a674dc4f0e24e3a7_.log 2025-08-14T21:28:34.2523865Z Running 0 items in this shard: 2025-08-14T21:28:34.2524419Z 2025-08-14T21:28:34.2536806Z Running test_functional_optim 1/1 ... [2025-08-14 21:28:34.252386] 2025-08-14T21:28:34.2537610Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:34.2539605Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functional_optim.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:34.252944] 2025-08-14T21:28:37.5260077Z 2025-08-14T21:28:37.5261560Z test_functional_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_optim_1.1_5c62bafbaf4d66a5_.log 2025-08-14T21:28:37.5262939Z Running 0 items in this shard: 2025-08-14T21:28:37.5263282Z 2025-08-14T21:28:37.5267398Z Running test_functionalization 1/1 ... [2025-08-14 21:28:37.526088] 2025-08-14T21:28:37.5268109Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:37.5270703Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functionalization.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:37.526524] 2025-08-14T21:28:40.6490334Z 2025-08-14T21:28:40.6491903Z test_functionalization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functionalization_1.1_83d47e9681031732_.log 2025-08-14T21:28:40.6493434Z Running 0 items in this shard: 2025-08-14T21:28:40.6493810Z 2025-08-14T21:28:40.6494143Z Running test_futures 1/1 ... [2025-08-14 21:28:40.648759] 2025-08-14T21:28:40.6494939Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:40.6496491Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_futures.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:40.649245] 2025-08-14T21:28:43.7711515Z 2025-08-14T21:28:43.7712907Z test_futures 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_futures_1.1_8d87ea300d32f89d_.log 2025-08-14T21:28:43.7714129Z Running 0 items in this shard: 2025-08-14T21:28:43.7714465Z 2025-08-14T21:28:43.7716944Z Running test_fx_experimental 1/1 ... [2025-08-14 21:28:43.771305] 2025-08-14T21:28:43.7717676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:43.7723814Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_experimental.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:43.771732] 2025-08-14T21:28:48.5471340Z 2025-08-14T21:28:48.5472776Z test_fx_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_experimental_1.1_8ebe2c2d6b0136e5_.log 2025-08-14T21:28:48.5473947Z Running 0 items in this shard: 2025-08-14T21:28:48.5474244Z 2025-08-14T21:28:48.5478464Z Running test_fx_passes 1/1 ... [2025-08-14 21:28:48.547359] 2025-08-14T21:28:48.5479223Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:48.5488590Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_passes.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:48.547885] 2025-08-14T21:28:51.8207637Z 2025-08-14T21:28:51.8209193Z test_fx_passes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_passes_1.1_a3a221cd469eba35_.log 2025-08-14T21:28:51.8210510Z Running 0 items in this shard: 2025-08-14T21:28:51.8210911Z 2025-08-14T21:28:51.8213814Z Running test_fx_reinplace_pass 1/1 ... [2025-08-14 21:28:51.820883] 2025-08-14T21:28:51.8214774Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:51.8218921Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_reinplace_pass.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:51.821335] 2025-08-14T21:28:54.9432391Z 2025-08-14T21:28:54.9434091Z test_fx_reinplace_pass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_reinplace_pass_1.1_161cb0e037ddfdde_.log 2025-08-14T21:28:54.9435497Z Running 0 items in this shard: 2025-08-14T21:28:54.9435851Z 2025-08-14T21:28:54.9437037Z Running test_hub 1/1 ... [2025-08-14 21:28:54.943364] 2025-08-14T21:28:54.9437689Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:54.9444270Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hub.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:54.943799] 2025-08-14T21:28:58.0657709Z 2025-08-14T21:28:58.0659217Z test_hub 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hub_1.1_aeda8297d13c210d_.log 2025-08-14T21:28:58.0660374Z Running 0 items in this shard: 2025-08-14T21:28:58.0660708Z 2025-08-14T21:28:58.0661490Z Running test_legacy_vmap 1/1 ... [2025-08-14 21:28:58.065821] 2025-08-14T21:28:58.0662217Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:28:58.0668854Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_legacy_vmap.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:28:58.066252] 2025-08-14T21:29:01.7896822Z 2025-08-14T21:29:01.7898057Z test_legacy_vmap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_legacy_vmap_1.1_2a17a001389bf7e1_.log 2025-08-14T21:29:01.7899351Z Running 0 items in this shard: 2025-08-14T21:29:01.7899710Z 2025-08-14T21:29:01.7900642Z Running test_meta 3/4 ... [2025-08-14 21:29:01.789686] 2025-08-14T21:29:01.7901291Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:01.7906491Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '-m', 'serial', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:01.790081] 2025-08-14T21:29:13.7291085Z 2025-08-14T21:29:13.7291911Z test_meta 3/4 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_3.4_9bb267ec160adee5_.log 2025-08-14T21:29:13.7292431Z Running 0 items in this shard: 2025-08-14T21:29:13.7292578Z 2025-08-14T21:29:13.7292725Z Running test_sparse_csr 3/3 ... [2025-08-14 21:29:13.729032] 2025-08-14T21:29:13.7293031Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:13.7295249Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_csr.py', '-m', 'serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:13.729305] 2025-08-14T21:29:19.2558230Z 2025-08-14T21:29:19.2559381Z test_sparse_csr 3/3 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_csr_3.3_86a3d661fb05e8ee_.log 2025-08-14T21:29:19.2560708Z Running 0 items in this shard: 2025-08-14T21:29:19.2561195Z 2025-08-14T21:29:19.2565790Z Running test_stateless 1/1 ... [2025-08-14 21:29:19.256138] 2025-08-14T21:29:19.2566599Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:19.2571895Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_stateless.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:19.256654] 2025-08-14T21:29:22.7799433Z 2025-08-14T21:29:22.7800840Z test_stateless 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_stateless_1.1_74f3f87c9fde7a15_.log 2025-08-14T21:29:22.7802085Z Running 0 items in this shard: 2025-08-14T21:29:22.7802415Z 2025-08-14T21:29:22.7807187Z Running test_subclass 1/1 ... [2025-08-14 21:29:22.780212] 2025-08-14T21:29:22.7807944Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:22.7813470Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_subclass.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:22.780721] 2025-08-14T21:29:26.0029289Z 2025-08-14T21:29:26.0030224Z test_subclass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_subclass_1.1_15d4d4eb16f42c62_.log 2025-08-14T21:29:26.0031762Z Running 0 items in this shard: 2025-08-14T21:29:26.0032048Z 2025-08-14T21:29:26.0032311Z Running test_sympy_utils 1/1 ... [2025-08-14 21:29:26.002753] 2025-08-14T21:29:26.0032880Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:26.0034355Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sympy_utils.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:26.003006] 2025-08-14T21:29:29.4759389Z 2025-08-14T21:29:29.4761201Z test_sympy_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sympy_utils_1.1_16f413085d3a703b_.log 2025-08-14T21:29:29.4762478Z Running 0 items in this shard: 2025-08-14T21:29:29.4762847Z 2025-08-14T21:29:29.4766880Z Running test_tensorexpr_pybind 1/1 ... [2025-08-14 21:29:29.476258] 2025-08-14T21:29:29.4767801Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:29.4775106Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_tensorexpr_pybind.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:29.476862] 2025-08-14T21:29:32.5988558Z 2025-08-14T21:29:32.5989828Z test_tensorexpr_pybind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorexpr_pybind_1.1_b3cbdff0ce9145c4_.log 2025-08-14T21:29:32.5990986Z Running 0 items in this shard: 2025-08-14T21:29:32.5991975Z 2025-08-14T21:29:32.5995993Z Running test_transformers 1/1 ... [2025-08-14 21:29:32.599192] 2025-08-14T21:29:32.5996772Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:32.6002820Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_transformers.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:32.599719] 2025-08-14T21:29:39.3783550Z 2025-08-14T21:29:39.3784360Z test_transformers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_transformers_1.1_38f22f2e46e73e7e_.log 2025-08-14T21:29:39.3784937Z Running 0 items in this shard: 2025-08-14T21:29:39.3785077Z 2025-08-14T21:29:39.3786677Z Running xpu/test_gemm 1/1 ... [2025-08-14 21:29:39.378457] 2025-08-14T21:29:39.3787258Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:39.3790349Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:39.378739] 2025-08-14T21:29:42.9371420Z 2025-08-14T21:29:42.9372708Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_39a83b3e803f66f7_.log 2025-08-14T21:29:42.9374301Z Running 0 items in this shard: 2025-08-14T21:29:45.4193943Z 2025-08-14T21:29:45.4196590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T21:29:45.4199066Z import pkg_resources 2025-08-14T21:29:45.4344766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T21:29:45.4347279Z import pkg_resources 2025-08-14T21:29:45.8849785Z Running test_public_bindings 1/1 ... [2025-08-14 21:29:45.884428] 2025-08-14T21:29:45.8851235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:45.8853079Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_public_bindings.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:45.884759] 2025-08-14T21:29:45.9218198Z Running inductor/test_aot_inductor 5/5 ... [2025-08-14 21:29:45.921297] 2025-08-14T21:29:45.9219045Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:45.9220992Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '-m', 'not serial', '--shard-id=5', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:45.921612] 2025-08-14T21:29:55.1691425Z 2025-08-14T21:29:55.1693187Z test_public_bindings 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_public_bindings_1.1_2fd8d93ab65ac661_.log 2025-08-14T21:29:55.1697302Z Running 4 items in this shard: test/test_public_bindings.py::TestPublicBindings::test_correct_module_names, test/test_public_bindings.py::TestPublicBindings::test_modules_can_be_imported, test/test_public_bindings.py::TestPublicBindings::test_no_new_bindings, test/test_public_bindings.py::TestPublicBindings::test_no_new_reexport_callables 2025-08-14T21:29:55.1699325Z 2025-08-14T21:29:55.1699516Z Running inductor/test_torchinductor 1/2 ... [2025-08-14 21:29:55.168695] 2025-08-14T21:29:55.1700379Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:29:55.1701398Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor.py', '-m', 'not serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:29:55.168981] 2025-08-14T21:37:13.0622568Z 2025-08-14T21:37:13.0624655Z inductor/test_torchinductor 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_1.2_c7413037a90643b1_.log 2025-08-14T21:37:13.0863598Z Running 448 items in this shard: test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast1_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast2_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_broadcast3_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_dense, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_dense_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_int, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_double_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_int_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_broadcast1, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_broadcast3, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_strided, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_strided_transposed, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_broadcast2, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_double, test/inductor/test_torchinductor.py::SweepInputsGPUTest::test_cuda_transposed_int, test/inductor/test_torchinductor.py::GPUTests::test_AllenaiLongformerBase_repro_cuda, test/inductor/test_torchinductor.py::GPUTests::test__dyn_quant_pack_4bit_weight_cuda, test/inductor/test_torchinductor.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda, test/inductor/test_torchinductor.py::GPUTests::test_abs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool1d_argmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_avg_pool_with_output_size_0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_max_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_max_pool2d3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_adaptive_pool_errors_with_long_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_complex6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_const_float_cuda, test/inductor/test_torchinductor.py::GPUTests::test_add_inplace_permuted_cuda, test/inductor/test_torchinductor.py::GPUTests::test_addmm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_alexnet_prefix_cuda, test/inductor/test_torchinductor.py::GPUTests::test_angle_cuda, test/inductor/test_torchinductor.py::GPUTests::test_any_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_with_persistent_cache_cuda, test/inductor/test_torchinductor.py::GPUTests::test_aoti_eager_with_scalar_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_arange6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_argmin1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_argmax_argmin2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_as_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_assert_alignment_op_name_fail_cuda, test/inductor/test_torchinductor.py::GPUTests::test_assert_size_stride_op_name_fail_cuda, test/inductor/test_torchinductor.py::GPUTests::test_assert_size_stride_op_name_pass_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool2d8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_avg_pool3d_backward4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_batch_norm_2d_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_batch_norm_2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bitwise2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bmm2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_both_scalars_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_add_autotune_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_computed_offsets_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_default_kwargs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int32_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int64_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_int8_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_bucketize_int_uint8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_float_ndigits_neg_cuda, test/inductor/test_torchinductor.py::GPUTests::test_builtins_round_int_ndigits_pos_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_empty_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_single_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cat_unbacked_empty_1d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cauchy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_chunk_recompiles_cuda, test/inductor/test_torchinductor.py::GPUTests::test_clamp_type_promotion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_clamp_type_promotion_non_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_clone_cuda, test/inductor/test_torchinductor.py::GPUTests::test_compar_cuda, test/inductor/test_torchinductor.py::GPUTests::test_complex_fallback_cuda, test/inductor/test_torchinductor.py::GPUTests::test_complex_memory_overlap_cuda, test/inductor/test_torchinductor.py::GPUTests::test_computed_buffer_inlining_cuda, test/inductor/test_torchinductor.py::GPUTests::test_concat_add_inplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_config_option_dont_assume_alignment_cudagraphs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_config_option_dont_assume_alignment_recompiles_cuda, test/inductor/test_torchinductor.py::GPUTests::test_consecutive_split_cumprod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_const_int32_to_float_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_2d_strides_nonpositive_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_3d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_fill_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_constant_pad_nd_inplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv2d_backward_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv2d_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv3d_channels_last_use_block_ptr_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv3d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_conv_shape_check_cuda, test/inductor/test_torchinductor.py::GPUTests::test_convolution1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_convolution3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cudnn_rnn_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cummin_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumprod_zero_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumsum_inf_cuda, test/inductor/test_torchinductor.py::GPUTests::test_cumsum_pattern_matcher_issue_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_custom_op_unbacked_symints_cuda, test/inductor/test_torchinductor.py::GPUTests::test_data_type_propogation_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dense_mask_index_cuda, test/inductor/test_torchinductor.py::GPUTests::test_deterministic_codegen_on_graph_break_cuda, test/inductor/test_torchinductor.py::GPUTests::test_deterministic_codegen_with_suffix_cuda, test/inductor/test_torchinductor.py::GPUTests::test_device_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_by_zero_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_precision_cuda, test/inductor/test_torchinductor.py::GPUTests::test_div_presicion_accuracy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dont_constant_fold_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dropout_trivial_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtype_mismatch_issue_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_bfloat16_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float16_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float32_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_float32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_float64_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_fusion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int16_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int32_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_bfloat16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int64_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_int8_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_dtypeview_uint8_uint8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_elu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_embedding_bag_byte_unpack_cuda, test/inductor/test_torchinductor.py::GPUTests::test_embedding_bag_cuda, test/inductor/test_torchinductor.py::GPUTests::test_embedding_sparse_cuda, test/inductor/test_torchinductor.py::GPUTests::test_empty_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_exp2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_exp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_expand_as_cuda, test/inductor/test_torchinductor.py::GPUTests::test_expand_cuda, test/inductor/test_torchinductor.py::GPUTests::test_expanded_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_basic_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_list_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_with_return_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fft_real_input_real_output_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fill1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float16_to_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float32_to_int32_cuda, test/inductor/test_torchinductor.py::GPUTests::test_float_index_expression_type_promotion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_floordiv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fmod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fmod_zero_dim_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_boolean_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_like_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_like_sliced_cuda, test/inductor/test_torchinductor.py::GPUTests::test_full_truncation_cuda, test/inductor/test_torchinductor.py::GPUTests::test_functionalize_rng_wrappers_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fuse_large_params_cuda, test/inductor/test_torchinductor.py::GPUTests::test_fusing_write_into_disjoint_read_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gather1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gather3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_gelu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_generate_rand_fp8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_generated_code_has_alignment_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_generated_code_has_size_stride_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_glu_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_arange2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_argmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_both_scalars_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_constant_tensor1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_mutation_real_name_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_pad_dynamic_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_refcount_cuda, test/inductor/test_torchinductor.py::GPUTests::test_graph_partition_scalar_inputs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_grid_sampler_2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_hardtanh_cuda, test/inductor/test_torchinductor.py::GPUTests::test_horizonal_fusion1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_flip_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_floordiv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_propagation_remainder_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_as_masked_fill_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_failed_reinplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_fallback1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_put_fallback2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_index_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_indirect_load_broadcast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inductor_assert_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inductor_triton_bucketize_respects_masking_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_mixed_dtype_ops_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_resize_as_cuda, test/inductor/test_torchinductor.py::GPUTests::test_inplace_where_pointwise_cuda, test/inductor/test_torchinductor.py::GPUTests::test_input_mutation2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_int8_weight_only_quant_cuda, test/inductor/test_torchinductor.py::GPUTests::test_int_input_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_invalid_operand_issue1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_isin_tensor_scalar_cuda, test/inductor/test_torchinductor.py::GPUTests::test_isinf2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_grid_use_block_ptr_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_large_offset_pointwise_cuda, test/inductor/test_torchinductor.py::GPUTests::test_layer_norm_cuda, test/inductor/test_torchinductor.py::GPUTests::test_lerp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_rands2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_like_rands_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linear_dynamic_maxautotune_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linear_float64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_linspace3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_list_clearing_cuda, test/inductor/test_torchinductor.py::GPUTests::test_log1p_cuda, test/inductor/test_torchinductor.py::GPUTests::test_log2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_logsumexp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_long_tensor_cuda, test/inductor/test_torchinductor.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_masked_fill_cuda, test/inductor/test_torchinductor.py::GPUTests::test_masked_fill_promotion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d6_dilation_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d6_dilation_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d7_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda, test/inductor/test_torchinductor.py::GPUTests::test_max_pool2d_with_indices_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mean_cuda, test/inductor/test_torchinductor.py::GPUTests::test_min_max_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_misaligned_address_issue1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_move_arange_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multi_device_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multi_threading_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_any_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_prime_size_cuda, test/inductor/test_torchinductor.py::GPUTests::test_multilayer_sum_low_prec_cuda, test/inductor/test_torchinductor.py::GPUTests::test_mutations_loop_fusion_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nan_sort_stable_True_descending_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_nan_sort_stable_True_descending_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_narrow_cuda, test/inductor/test_torchinductor.py::GPUTests::test_needs_contiguous_strides_cuda, test/inductor/test_torchinductor.py::GPUTests::test_new_empty_cuda, test/inductor/test_torchinductor.py::GPUTests::test_new_empty_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_new_ones_cuda, test/inductor/test_torchinductor.py::GPUTests::test_one_hot_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pad_cast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pad_single_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pad_view_cuda, test/inductor/test_torchinductor.py::GPUTests::test_philox_rand_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pixel_shuffle_channels_last_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_airy_ai_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_bessel_y0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_bessel_y1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_chebyshev_polynomial_t_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_erf_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_erfcx_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_erfinv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_exp2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_gammaln_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_hermite_polynomial_h_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_i1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_laguerre_polynomial_l_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_legendre_polynomial_p_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_log1p_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_log_ndtr_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_modified_bessel_i0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_modified_bessel_i1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_modified_bessel_k1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_multigammaln_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_ndtri_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_psi_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_round_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_scaled_modified_bessel_k0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_v_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_w_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_spherical_bessel_j0_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pointwise_zeta_cuda, test/inductor/test_torchinductor.py::GPUTests::test_pow_by_natural_log2_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_prod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_rand_like_deterministic_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randint_kernel_count_cuda, test/inductor/test_torchinductor.py::GPUTests::test_randn_with_dtype_and_device_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reduction4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_reflection_pad2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remainder_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_clone_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_copy_cuda, test/inductor/test_torchinductor.py::GPUTests::test_remove_noop_slice1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_as_strided_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_repeat_interleave_cuda, test/inductor/test_torchinductor.py::GPUTests::test_roi_align_cuda, test/inductor/test_torchinductor.py::GPUTests::test_round_correctness_cuda, test/inductor/test_torchinductor.py::GPUTests::test_round_cuda, test/inductor/test_torchinductor.py::GPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scalar_cpu_tensor_arg_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scalar_input_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scaled_dot_product_efficient_attention_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter6_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_add2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_bf16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_reduce2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scatter_reduce3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_scheduler_vertical_fusion1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sdpa_unaligned_mask_cuda, test/inductor/test_torchinductor.py::GPUTests::test_searchsorted_broadcast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_setitem_with_int_parameter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sgn_cuda, test/inductor/test_torchinductor.py::GPUTests::test_shape_prop_torch_ones_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sign_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_simplify_loops_cuda, test/inductor/test_torchinductor.py::GPUTests::test_single_elem_indirect_cuda, test/inductor/test_torchinductor.py::GPUTests::test_size_asserts_for_multi_output_fallback_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sizehint_issue1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_scatter_cuda, test/inductor/test_torchinductor.py::GPUTests::test_slice_view_with_graph_break_cuda, test/inductor/test_torchinductor.py::GPUTests::test_softmax_cuda, test/inductor/test_torchinductor.py::GPUTests::test_softmax_one_kernel_persist_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sort_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sort_stable_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_cumprod_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_failed_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_reduction_with_int64_size_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_with_list_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_with_sizes_with_unbacked_symints_cuda, test/inductor/test_torchinductor.py::GPUTests::test_split_with_unbacked_symints_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor.py::GPUTests::test_squeeze1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_stack_cuda, test/inductor/test_torchinductor.py::GPUTests::test_strided_inputs_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum4_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum_dtype_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum_int_cuda, test/inductor/test_torchinductor.py::GPUTests::test_sum_keepdims_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tan_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tanh_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tensor_index_put_slice_cuda, test/inductor/test_torchinductor.py::GPUTests::test_tmp_not_defined_issue3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_topk_cuda, test/inductor/test_torchinductor.py::GPUTests::test_transpose_add_cuda, test/inductor/test_torchinductor.py::GPUTests::test_transpose_cuda, test/inductor/test_torchinductor.py::GPUTests::test_triton_kernel_bool_param_cuda, test/inductor/test_torchinductor.py::GPUTests::test_uint_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unroll_small_reduction_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_float16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_int16_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_int64_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unspec_inputs_int8_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unsqueeze_cuda, test/inductor/test_torchinductor.py::GPUTests::test_unsqueeze_inplace_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_bicubic2d_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_cat_conv_cuda, test/inductor/test_torchinductor.py::GPUTests::test_upsample_nearest2d_backward_cuda, test/inductor/test_torchinductor.py::GPUTests::test_var_mean_div_by_cuda, test/inductor/test_torchinductor.py::GPUTests::test_vdd_clamp_cuda, test/inductor/test_torchinductor.py::GPUTests::test_vertical_fusion1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_view_as_real_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views1_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views2_cuda, test/inductor/test_torchinductor.py::GPUTests::test_views3_cuda, test/inductor/test_torchinductor.py::GPUTests::test_where_broadcast_cuda, test/inductor/test_torchinductor.py::GPUTests::test_where_with_logical_op_cuda, test/inductor/test_torchinductor.py::GPUTests::test_zeros_cuda, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_bandwidth_profiler, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_codegen_config_option_dont_assume_alignment, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_comment_graph_fragment, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_computed_indirect_mask, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_constant_folding_deallocation, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_divisible_by_16_covers_numel_args, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace_gpt, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_evict_last_non_coalesced_loads, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_indirect_device_assert, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_inductor_detach_view_backend_aot_eager, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_kernel_names_descriptive, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_layer_norm_inplaces_after_matmul, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_non_blocking_copy_codegen, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_numpy_autograd, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_optimize_indexing_dtype, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_red_followed_by_transposed_pointwise, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_respect_scaled_grouped_mm_layout_tag, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_sdpa_inference_mode_aot_compile, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_skip_l1_cache, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_split_op_with_sym, test/inductor/test_torchinductor.py::TritonCodeGenTests::test_triton_attrs_dict_constexpr_signature, test/inductor/test_torchinductor.py::NanCheckerTest::test_nan_checker_fail 2025-08-14T21:37:13.1093273Z 2025-08-14T21:37:13.1093766Z Running inductor/test_torchinductor_dynamic_shapes 5/8 ... [2025-08-14 21:37:13.063942] 2025-08-14T21:37:13.1094637Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:37:13.1096618Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '-m', 'not serial', '--shard-id=5', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:37:13.064530] 2025-08-14T21:38:29.9617736Z 2025-08-14T21:38:29.9621072Z inductor/test_aot_inductor 5/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_5.5_936d8429ec3ef514_.log 2025-08-14T21:38:29.9698421Z Running 173 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse_zero_consts_use_consts_asm_false, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__int_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aliased_buffer_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_cpp_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_fp8_dtype_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_user_defined_triton_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_assert_tensor_meta_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_deconv_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fake_tensor_device_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fqn_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_with_none_index_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_int_list_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_dynamic_maxautotune_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_missing_cubin_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_missing_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multiple_output_alias_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_narrow_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nested_tensor_from_jagged_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_none_args_aot_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_poi_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_profile_benchmark_harness_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_seq_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_shifted_constraint_ranges_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_embed_kernel_binary_False_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symfloat_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sympy_cpp_printer_min_max_minmax1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_torchvision_transforms_functional_tensor_resize_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_bool_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_extern_kernel_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_inactive_constant_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_nested_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_mixed_device_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_cudagraphs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_grid_with_backed_symbols_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_grid_with_unbacked_symbols_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_sym_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_user_defined_triton_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_runtime_asserts_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_async_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_bmm_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_mismatched_branch_output_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_reinterpret_view_inputs_outputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fallback_kernel_with_symexpr_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fft_c2c_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fp8_view_of_param_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fx_gm_return_tuple_validation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_issue_140766_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pad_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_poi_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pytree_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_embed_kernel_binary_False_max_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_split_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_torchvision_transforms_functional_tensor_resize_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_dynamic_launcher_grid_infer_from_tensor_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_bool_param_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_multi_output_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_weird_param_order_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_update_constant_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_update_user_managed_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_using_model_name_for_files_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_amp_fallback_random_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_async_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_and_force_mmap_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fft_c2c_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fqn_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_with_none_index_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misaligned_input_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_no_args_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_on_gpu_device1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_hann_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_permute_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_replicate_on_devices_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_dtype_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_shifted_constraint_ranges_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_small_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_subclasses_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax0_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_torchvision_transforms_functional_tensor_resize_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_float_arg_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mem_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_user_managed_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_upper_bound_i64_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_cudagraphs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps 2025-08-14T21:38:29.9787491Z 2025-08-14T21:38:29.9787732Z Running inductor/test_torchinductor_codegen_dynamic_shapes 1/6 ... [2025-08-14 21:38:29.962217] 2025-08-14T21:38:29.9788121Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:38:29.9788965Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_dynamic_shapes.py', '-m', 'not serial', '--shard-id=1', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:38:29.962561] 2025-08-14T21:44:23.6352373Z 2025-08-14T21:44:23.6354582Z inductor/test_torchinductor_dynamic_shapes 5/8 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_5.8_0098b63c6e41e120_.log 2025-08-14T21:44:23.6521743Z Running 200 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool2d_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_const_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_addmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_angle_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_batch_norm_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_empty_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_with_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cudnn_rnn_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_op_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dist_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div9_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmin_fmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_full_like_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gather2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_unbacked_symint_as_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_nested_indirect_indexing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_deterministic_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_int_input_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_leaky_relu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfcx_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammainc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_log_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_logit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_like_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_with_dtype_and_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sign_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_size_asserts_for_multi_output_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_backward_data_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_squeeze_varargs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_transpose_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unbacked_floordiv_simplify_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_bilinear2d_a_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vdd_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vectorized_ops_masked_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_support_out_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_copied_in_graph_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cauchy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_chunk_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_no_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_scan_op_compiled_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dist_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_like_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fuse_large_params_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fusing_write_into_disjoint_read_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_pad_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_hardsigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_horizonal_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_deterministic_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_where_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_isinf2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lgamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_dynamic_maxautotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_min_max_reduction_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mm_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_var_lowp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_neg_max_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pad_cast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_philox_rand_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_bessel_y0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_expm1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i1e_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_xlogy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_kernel_count_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_view_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_replication_pad_errors_with_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scheduler_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sin_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_backward_data_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_squeeze_varargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_strided_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_keepdims_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_vectorized_ops_masked_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_as_real_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_arithmetic_constant_folding_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_to_inputs_kernel_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_pad_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_cat_backwards_save_data_dependent_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_matmul_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_reduction_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_softshrink_cuda 2025-08-14T21:44:23.6689853Z 2025-08-14T21:44:23.6690318Z Running inductor/test_torchinductor_opinfo 1/18 ... [2025-08-14 21:44:23.635402] 2025-08-14T21:44:23.6691146Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:44:23.6691943Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T21:44:23.6692760Z Uploading artifacts took 0.00 seconds 2025-08-14T21:44:23.6694967Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=1', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:44:23.635731] 2025-08-14T21:47:31.2159847Z 2025-08-14T21:47:31.2161858Z inductor/test_torchinductor_codegen_dynamic_shapes 1/6 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_1.6_1aa974992816887e_.log 2025-08-14T21:47:31.2337077Z Running 288 items in this shard: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aliased_buffer_reuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_argmin_with_duplicates_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool3d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool3d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool_errors_with_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_batch_norm_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bernoulli1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_both_scalars_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_copied_in_graph_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_negative_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_consecutive_split_cumprod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_const_int32_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_nd_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_channels_last_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cudnn_rnn_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div9_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_by_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_trivial_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtype_mismatch_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_embedding_bag_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_exact_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_exp2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fft_real_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fmin_fmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_full_like_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_full_like_transposed_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fuse_tiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_mutation_real_name_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_no_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_horizonal_fusion2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_failed_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_isin_tensor_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_isinf2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_l1_loss_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lgamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linspace3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logcumsumexp_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_masked_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_matmul_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_misaligned_address_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multi_gpu_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mutable_custom_op_fixed_layout_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_narrow_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_new_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_no_mega_fusion_during_lowering_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_output_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_permute1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_philox_rand_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_chebyshev_polynomial_t_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_digamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammainc_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_log_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_ndtri_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_psi_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_profiler_mark_wrapper_call_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randint_int64_mod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randint_kernel_count_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randn_generator_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randn_like_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reflection_pad2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_replication_pad_errors_with_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_require_stride_expanded_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reuse_buffers_with_aliasing_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_select_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_shape_padding_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_silu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_single_elem_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_mutation3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_special_polygamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumprod_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_failed_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_integer_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_stack_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_device_constant_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_topk_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_uint4x2_mixed_mm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unroll_small_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_bilinear2d_a_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_bilinear2d_b_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest3d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_mean_div_by_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_view_uint8_through_differing_bitwidths_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_zero_dim_reductions_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_zero_element_mutation_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool2d_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_addmv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aliased_buffer_reuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_support_str_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_with_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin_with_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_min_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool_errors_with_uint_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_baddbmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bernoulli2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bitwise3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_add_autotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_computed_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_complex_memory_overlap_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_const_int32_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_2d_strides_nonpositive_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv2d_backward_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_convolution3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cos_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_no_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_data_type_propogation_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div9_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div_precision_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout_deterministic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_bfloat16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_bag_byte_unpack_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_expanded_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_basic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fill2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float32_to_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float_repr_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fractional_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fractional_max_pool2d5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_boolean_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_like_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_functionalize_rng_wrappers_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fuse_tiled_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_gather1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_constant_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_mutation_real_name_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_pad_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_scalar_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_indirect_load_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inductor_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_int8_weight_only_quant_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_layer_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_leaky_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linspace3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_long_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_min_max_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_min_max_reduction_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mm_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mul_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_any_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mutable_custom_op_fixed_layout_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_True_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_narrow_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_no_op_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_permute2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_airy_ai_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_entr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_erfcx_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_ndtr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_spherical_bessel_j0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_prod_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_distribution_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_int64_mod_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reflection_pad2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_copy_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_slice1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_should_pad_bench_for_bmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_signbit_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_simplify_loops_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sort_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_special_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_squeeze_varargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_strided_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor_index_put_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_topk_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_cat_conv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_zero_element_mutation_dynamic_shapes_cuda 2025-08-14T21:47:31.2448462Z 2025-08-14T21:47:31.2448676Z Running inductor/test_torchinductor_opinfo 5/18 ... [2025-08-14 21:47:31.217358] 2025-08-14T21:47:31.2449028Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:47:31.2449834Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=5', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:47:31.217729] 2025-08-14T21:53:20.7080708Z 2025-08-14T21:53:20.7088709Z inductor/test_torchinductor_opinfo 1/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_1.18_936fe34df54f658d_.log 2025-08-14T21:53:20.7197846Z Running 209 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_inverse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_inverse_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igammac_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_normal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_kl_div_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_kaiser_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_indices_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_int32 2025-08-14T21:53:20.7330887Z 2025-08-14T21:53:20.7331348Z Running inductor/test_torchinductor_opinfo 11/18 ... [2025-08-14 21:53:20.708532] 2025-08-14T21:53:20.7332190Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:53:20.7333330Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=11', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:53:20.709053] 2025-08-14T21:55:32.4709881Z 2025-08-14T21:55:32.4724028Z inductor/test_torchinductor_opinfo 5/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_5.18_66a5d0a03ce7bd02_.log 2025-08-14T21:55:32.4874115Z Running 197 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__native_batch_norm_legit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_offsets_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bincount_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_not_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_no_rounding_mode_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vector_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_normal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_unpack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_prelu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_float16 2025-08-14T21:55:32.5025355Z 2025-08-14T21:55:32.5025794Z Running inductor/test_torchinductor_opinfo 15/18 ... [2025-08-14 21:55:32.470843] 2025-08-14T21:55:32.5026611Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T21:55:32.5028536Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '-m', 'not serial', '--shard-id=15', '--num-shards=18', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 21:55:32.471217] 2025-08-14T22:02:14.9573737Z 2025-08-14T22:02:14.9577663Z inductor/test_torchinductor_opinfo 11/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_11.18_5ce6c8c714086e8c_.log 2025-08-14T22:02:14.9670610Z Running 205 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_offsets_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_householder_product_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_number_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polar_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_renorm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_nuttall_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_uint8 2025-08-14T22:02:14.9753859Z 2025-08-14T22:02:14.9754064Z Running inductor/test_cpu_repro 2/4 ... [2025-08-14 22:02:14.957662] 2025-08-14T22:02:14.9754452Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:02:14.9755398Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_repro.py', '-m', 'not serial', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:02:14.958016] 2025-08-14T22:04:59.4845691Z 2025-08-14T22:04:59.4846707Z inductor/test_torchinductor_opinfo 15/18 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_15.18_863b6ae79c1ebfb2_.log 2025-08-14T22:04:59.4909753Z Running 177 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cond_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanquantile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_nuc_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polar_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_cosine_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_indices_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float64 2025-08-14T22:04:59.4970556Z 2025-08-14T22:04:59.4970727Z Running dynamo/test_dynamic_shapes 3/3 ... [2025-08-14 22:04:59.484724] 2025-08-14T22:04:59.4971068Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:04:59.4971435Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T22:04:59.4971806Z Uploading artifacts took 0.00 seconds 2025-08-14T22:04:59.4972634Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dynamic_shapes.py', '-m', 'not serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:04:59.484995] 2025-08-14T22:08:53.7958525Z 2025-08-14T22:08:53.7964531Z inductor/test_cpu_repro 2/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_2.4_c97d55bba141a906_.log 2025-08-14T22:08:53.8106016Z Running 178 items in this shard: test/inductor/test_cpu_repro.py::CPUReproTests::test_argmax_argmin_with_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_broadcast_scalar_cpp_tile_2d_kernel, test/inductor/test_cpu_repro.py::CPUReproTests::test_channels_last_view_as_complex, test/inductor/test_cpu_repro.py::CPUReproTests::test_complex_memory_overlap, test/inductor/test_cpu_repro.py::CPUReproTests::test_concat_inner_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_constant_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_stride_constraints, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int32_to_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_cpp_kernel_profile, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_maxpool2d_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_relu_quant_dequant_relu_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_disabled_amp_is_inference_True, test/inductor/test_cpu_repro.py::CPUReproTests::test_dropout, test/inductor/test_cpu_repro.py::CPUReproTests::test_embedding_vec_bf16, test/inductor/test_cpu_repro.py::CPUReproTests::test_for_loop_collapsed, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_15,3,13, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_4,2048,4096, test/inductor/test_cpu_repro.py::CPUReproTests::test_frexp, test/inductor/test_cpu_repro.py::CPUReproTests::test_horizontal_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_in_out_buffer, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_propagation_issue_102065, test/inductor/test_cpu_repro.py::CPUReproTests::test_inplace_add_alpha, test/inductor/test_cpu_repro.py::CPUReproTests::test_invalid_index_of_empty_tensor, test/inductor/test_cpu_repro.py::CPUReproTests::test_ir_node_str, test/inductor/test_cpu_repro.py::CPUReproTests::test_issue122380, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_half, test/inductor/test_cpu_repro.py::CPUReproTests::test_local_buffer_with_line_reuse, test/inductor/test_cpu_repro.py::CPUReproTests::test_logical_op_store_to_lowp_data_dtype, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_load_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_max_reduction_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_maxpool2d_with_pre_loop_collapse_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_memory_copy_with_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_meta_device, test/inductor/test_cpu_repro.py::CPUReproTests::test_mkl_linear, test/inductor/test_cpu_repro.py::CPUReproTests::test_multihead_attention_cpu, test/inductor/test_cpu_repro.py::CPUReproTests::test_new_vec_op_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_nn_param_assign_wrapped, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_index_with_constant_stride, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_reduction_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_parallel_num_threads, test/inductor/test_cpu_repro.py::CPUReproTests::test_pow_cos, test/inductor/test_cpu_repro.py::CPUReproTests::test_reduce_with_masked, test/inductor/test_cpu_repro.py::CPUReproTests::test_reduction_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_redundant_to_node_elimination_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_relu_with_inf_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_require_stride_order_non_owning, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_share_local_buffers_in_outer_loop_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_sigmoid_with_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_store_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_load_decomposed_dequant_add_relu_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_dtype_float_bool, test/inductor/test_cpu_repro.py::CPUReproTests::test_torch_logit, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_sum2d_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_bitwise, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_indirect_load_cse_cache, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_logical, test/inductor/test_cpu_repro.py::CPUReproTests::test_view_dtype 2025-08-14T22:08:53.8255202Z 2025-08-14T22:08:53.8255444Z Running inductor/test_mkldnn_pattern_matcher 2/2 ... [2025-08-14 22:08:53.795852] 2025-08-14T22:08:53.8255888Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:08:53.8256900Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mkldnn_pattern_matcher.py', '-m', 'not serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:08:53.796493] 2025-08-14T22:13:52.8628356Z 2025-08-14T22:13:52.8632032Z dynamo/test_dynamic_shapes 3/3 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dynamic_shapes_3.3_967c7dc45c059935_.log 2025-08-14T22:13:52.9000733Z Running 632 items in this shard: test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_cpu_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_device_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autocast_sdpa_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_autograd_profiler_enabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_context_wrapping_set_grad_enabled_nested_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_device_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_event_created_outside_of_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_event_method_create_stream_outside_of_compile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_cuda_stream_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_disable_saved_tensors_hooks_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_graph_break_inlining_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_is_autocast_cpu_enabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_generic_context_manager_customized_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_generic_context_manager_with_graph_break_CustomizedCtxManager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_nested_generic_context_manager_with_graph_break_customized_ctx_manager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_return_context_manager_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesCtxManagerTests::test_torch_profiler_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_add_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_are_functorch_transforms_active_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_call_dict5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_callable_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_callable_lambda_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_chunks1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_class_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_complex_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_const_tuple_add1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_constant4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_default_dict_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_defaultdict_setdefault1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_defaultdict_setdefault2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_del_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_deque_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_copy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_key_set1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_param_keys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_sorted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dict_tuple_lazy_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_distributed_is_available_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_elipsis_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_enumerate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_enumerate_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_fallback_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_filter_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_finfo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_foreach_lerp__dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_fstrings3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_funcdef_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_functools_partial_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_getattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_globalmodule_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_import1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_in_not_in_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_indirect1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_indirect2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_indirect3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_lru_cache_fn_with_default_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_script_if_tracing_fn_with_default_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_inline_softmax_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_complex_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_fx_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_inference_mode_global_recompilation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_quantized_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itemgetter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_compress_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_permutations_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_permutations_basic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_product_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_product_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_itertools_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_len_constant_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_len_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_add_then_mutate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_index_with_constant_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_setitem_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_sorted1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_list_sorted2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_listarg3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_listarg5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_deque_extendleft_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_enumerate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_list_extend_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_max_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_partial_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_reduce_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_set_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_map_sum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_min_max_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_namedtuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndarray_builtin_functions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ndarray_transpose_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_constant_collections_guards_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_np_finfo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_as_integer_ratio_num_type0_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_conjugate_num_type2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_fft_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_meshgrid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_obj_is_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_ordered_dict_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partial_across_graph_break_uninvoked_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_graph_break_reconstruct_mix_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_graph_break_reconstruct_mix_no_source_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___builtins___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___code___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___eq___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___ge___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___get___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___globals___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___gt___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___hash___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___init___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___init_subclass___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___lt___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___module___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___name___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___new___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___qualname___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr___subclasshook___dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_hasattr_attr_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_recompilation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_torch_op_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_partials_udf_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_range_with_slice_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_reduce_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_reduce_with_single_with_initial_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_dict2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_return_tuple2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_returning_recursive_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_round_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_add_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_in_frozenset_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_set_update_list_with_duplicated_items_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_shape2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice6_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_slice_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_startswith_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_shortcut_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_shortcut_with_start_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_shortcut_with_start_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_sum_with_start_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_torch_from_numpy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_transpose_for_scores_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_tuple_sorted_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack_ex1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_unpack_ex2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_zip_longest_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_zip_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_add_sizes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_any_all_symnode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_arange_length_with_float32_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_assert_size_stride_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_assigning_function_to_class_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_assigning_function_to_object_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_bound_shape_checks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_build_tuple_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_abs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_builtin_str_on_user_defined_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cannot_trace_mark_dynamic_safe_unreached_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cast_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_catch_watchings2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cell_output2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_class_has_instancecheck_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_closure_recompiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_shapes_eq_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_compare_tensor_with_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_export_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_cond_with_quantization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_data_access_in_inference_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_data_ptr_graph_break_aten_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dataclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_default_args_device_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_deque_append_left_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_descriptor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_descriptor_side_effect_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dictcomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_disable_flag_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dunder_new_function_inlining_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_duplicate_graph_break_log_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_dynamic_override_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamic_sources_precedence_over_int_specialization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_compiling_fake_tensor_to_vararg_int_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_min_operator_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_dynamo_reset_clears_cache_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_empty_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_escaping_closure_var_with_backward_hook_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_fail_on_recompile_error_message_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_flat_name_to_original_fqn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_float_speculation_log_divergence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_getset_descriptor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_grad_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_grad_state_mutated_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_graph_break_compilation_metrics_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_failure_fn2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_failure_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_fn_by_id_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_filter_globals_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_size_oblivious_backed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_size_oblivious_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guard_sym_node_fstring_when_used_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_guards_cse_pass_multiple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_id_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_nn_mod1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_user_defined_object2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_user_defined_object3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_if_cond_user_defined_object_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inline_closure_not_loaded_by_parent_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_desugaring_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_param_update_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inplace_view_on_graph_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inspect_signature_bind_non_user_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_inspect_signature_parameters_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_int_neg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_compiling_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_floating_point_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_is_tensor_like_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_item_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_accumulate_tensors_builtins_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_accumulate_tensors_user_defined_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_infinite_count_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_infinite_cycle_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_islice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_itertools_tee_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_append_return_none_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_iadd_side_effect_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_list_mul_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mandelbrot_numpy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_module_not_callable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_mro_type_tensor_no_source_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_namedtuple1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_namedtuple2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nan_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_frozen_dataclass_hashable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_function_resuming_with_correct_globals_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_sequential_try_with_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_sequential_with_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nested_wraps_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nesteduserfunction_setattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nn_functional_reduction_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_nn_sequential_invocation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_no_error_on_nested_fx_trace_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_no_raise_guard_partial_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_non_pt2_compliant_ops_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_not_dynamic_scope_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_array_of_arrays_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_as_global_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_min_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_recompilation_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_size_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_torch_operators_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_unique_f16_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_ordered_dict_alias_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_ordered_dict_move_to_end_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_out_variant_custom_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_out_variants_with_resizing_on_graph_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_out_variants_with_resizing_on_graph_inputs_with_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_outside_linear_module_free_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pair_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_patched_builtin_functions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_proxy_frozen_dataclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pt2_compliant_overload_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_py_guards_mark_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_pytree_tree_leaves_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raise_guard_full_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_raises_importerror2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_real_imag_tensor_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_recursive_tensor_attribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_restore_graphstate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_running_func_with_captured_func_and_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_scalar_tensor_is_equivalent_to_int_list_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_create_symbolic_sizes_strides_storage_offset_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_empty_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_equal_evaluate_expr_replacement_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_no_recording_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_shape_env_recorded_function_fallback_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_simple_set_usage_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_size_dim_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_size_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_str_format_return1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_super_after_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_super_calling_with_metaclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_symint_as_device_kwarg_non_strict_export_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_symint_fold_nontrivial_product_modulo_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tagging_tensors_mix_used_unused_structure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_build_list_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tensor_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_thread_local_setattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_0d_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_float_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tolist_kd_dynamic_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_compile_ctx_on_forward_and_training_step_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_dynamo_codegen_pow_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_nn_parameter_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_objects_as_keys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_seed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_size_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_trace_ndarray_frame_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tracing_nested_py_tree_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tracing_nested_py_tree_tuples_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tracing_py_tree_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tracing_py_tree_tensor_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_class_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_from_tuple_iter_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_hasattr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_tuple_mul_with_shape_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_typing_variable_isinstance_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_unbacked_symint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_defined_object_class_interaction_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_function_variable_supports_enum_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_function_variable_supports_function_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_function_variable_supports_type_abcmeta_argument_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_getattribute_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_user_property_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_usr_cls_classmethod_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_variable_tracker_recursively_contains_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_version_ci_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_with_builtin_type_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_writes_to_cells_across_frames1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_add_sub_alpha_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_addr_alpha_beta_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_attached_attribute_in_dir_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_batch_norm_act_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_bitwise_print_precedence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_build_map_unpack_with_call_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_changing_stride_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_contains_range_constprop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_create_rand_mask_from_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dataclass_in_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_deleted_compile_wrapper_segfault_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_disabling_unpack_hooks_within_compiled_region_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_distributions_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dropout_inline_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shapes_implicit_guard_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shapes_right_side_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_embedding_backward_broadcasting_decomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_empty_graph_nested_calls_fullgraph_False_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_exception_in_dynamo_handling_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_exec_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_exec_wildcard_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_for_loop_graph_break_before_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_foreach_decomp_arg_names_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_function_in_skipfiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_gan_repro_trying_to_backward_through_the_graph_a_second_time_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_global_fn_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_grad_mode_carrying_correct_state_after_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_grad_references_cleared_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_graph_break_unsupported_fake_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_guard_ordering_shape_fail_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hasattr_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_bigbird_unsqueeze_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_classinstantier_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_model_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_t5_forward_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_hf_xsoftmax_training_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_iadd_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_indexing_with_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inductor_rng_default_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inference_mode_dynamic_shapes_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_inplace_unsqueeze_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_invalid_seq_unpack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_is_symbolic_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_kwargs_out_list_variable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_aliasing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_list_index_tensor_unsupported_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_longtensor_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_lru_cache_tracing_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_merge_criteria_processor_list1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_module_in_skipfiles_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_modules_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_named_buffers_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nested_while_loop_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_module_callable_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_module_property_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nn_parametrize_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_no_tracing_into_eval_frame_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_not_rewrite_assert_for_other_errors_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_nullcontext2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_odict_get_item_index_name_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_optim_state_references_cleared_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_os_fspath_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_overload_non_contiguous_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_out_root_cell_tuple_shape_change_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_overlapping_inputs_with_dynamic_shapes_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_partitioner_cse_respects_mutation_boundaries_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_reformer_min_chunk_len_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_reformer_sorting_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_relative_import_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_relative_import_no_modulename_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_requires_grad_guards_with_grad_mode1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_restricted_list_subclass1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_return_weakref_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_rng_state_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_seq_append_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_setattr_requires_grad_graph_breaks_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_setitem_boolean_mask_diff_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sigmoid_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_slicing_dynamic_shape_setitem_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sort_out2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_stk_sdd_is_transposed_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_stop_iteration_reconstruct_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_subclass_graph_output_repro_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_super_diamond_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_symnode_is_not_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_symnode_is_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_sys_monitoring_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_aot_eager_func_name_func3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_eager_func_name_func2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_eager_func_name_func3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_set_data_backend_inductor_func_name_func1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tensor_split_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_threading_local_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_tokenization_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_compile_in_compile_frame_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_ops_aten_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_tensor_ops_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_torch_tensor_ops_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_typed_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_typed_dict_total_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unbind_copy_out_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_unpack_hooks_can_be_disabled_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_vdd_duplicate_error_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_weakref_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_while_loop_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_while_loop_graph_break_inside_call_function_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_with_on_graph_break_inst_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_access_by_keys_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_basicmodule1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_children_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_constloop_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_densenet_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_enumvalues_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_fnmembercmp1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_layerlist_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_lazy_module_speculation_log_divergence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_module_comparison_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulelist_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulelist_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulemethod1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_modulemethod2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_named_children_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_nn_module_unspec_int_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_parameters5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_self_mutating1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_stringmember_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_submodules2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_super2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesNNModuleTests::test_super_class_method_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_capture_symbolic_tracing_simple_within_fake_mode_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dataclass_input_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_2_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_and_bypass_with_non_tensor_arg_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dupes_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamic_slicing_invalid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamo_enum_in_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_dynamo_list_index_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_decomp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_bypass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_bypass_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_graph_with_list_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_masking_with_no_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_meta_val_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_multi_dynamic_dim_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_multi_dynamic_dim_unsafe_relationship_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_nn_module_stack_patched_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_no_raise_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_pass_arg_by_name_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_persist_assert_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_preserves_nn_module_stack_for_get_attr_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_raise_guard_full_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_and_empty_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_with_default_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_args_with_default_tuple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_cond_branches_calling_methods_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_cond_dynamic_shape_pred_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_dict_values_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_free_function_and_class_method_multiarg_diff_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_method_on_module_invoke_twice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_not_none_control_flow_free_func_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_not_return_const_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_constant_tuple_nonzero_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_kwargs_with_default_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_map_zero_sized_tensor_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_nonzero_static_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_export_with_shallow_list_copy_wo_side_effects_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_fx_pytree_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_immutable_list_dict_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_invalid_input_global_multiple_access_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_invalid_input_unused_nonlocal_ok_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_no_tensor_computation_2_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_predispatch_with_for_out_dtype_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_preserve_fx_node_metadata_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_preserve_fx_node_metadata_inline_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_retracibility_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_subclass_parameters_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_sum_param_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_symbool_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_trivial_constraint_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_zeroes_in_and_out_different_shape_on_test_with_aten_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_control_flow3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_control_flow5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_dynamic_order_dependence_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_dynamic_zero_inference_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_extended_args_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_indirect_unsupported1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_indirect_unsupported2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_multigraph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_restore_range_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume4_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume5_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_paths_join_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_resume_with_no_grad2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_start1_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_start2_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_start3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_tuple_iterator_mutate_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesSubGraphTests::test_tuple_iterator_return_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_allow_python_side_effects_utility_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_subgraph_name_is_valid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_with_constant_pred_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_cond_with_empty_operands_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_flat_list_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_grad_source_fn_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hints_wrapper_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hints_wrapper_no_hints_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_hopify_generic_wrap_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_lift_tensors_with_compound_expressions_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_make_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_example_value_metadata_consistent_with_eager_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_lowers_to_graph_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_source_fn_stack_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_map_symint_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_nested_tuple_output_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_register_subclass_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_return_captured_var_used_multiple_times_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_local_list_append_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_global_tensor_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_mutate_nonlocal_num_builtin_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_side_effect_set_existing_attr_global_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_symint_in_slice_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_symint_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_tensor_and_unbacked_symbol_closure_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_vmap_multiply_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_kwarg_only_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_pytree_args_nested_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_pytree_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesHigherOrderOpTests::test_wrap_subgraph_name_is_valid_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_functional_call_disable_inline_nn_module_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_functional_call_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_call_compiled_backward_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_closure_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_fn_with_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_freevar_python_scalar_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_two_tensor_all_grad_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_grad_two_tensor_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_hessian_argnums_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_hessian_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacfwd_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacfwd_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacfwd_two_tensors_argnums_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jacrev_two_tensors_argnums_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_call_torch_compile_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_jvp_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_simple_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_enable_disable_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_grad_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_jvp_two_tensors_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vjp_has_aux_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vjp_multiple_outputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_call_compiled_backward_fn_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_free_const_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_get_wrapped_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_kwargs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_invocation_out_dims_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_diff_dims_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_new_tensor_implicit_via_op_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_out_dims_None_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_over_vmap_captured_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_over_vmap_two_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_previous_illegal_op_no_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_recompile_same_config_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_side_effects_append_input_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_two_inputs_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesFuncTorchHigherOrderOpTests::test_vmap_with_conditional_graph_break_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_non_tensor_arg_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_arg_dupe_via_dynamo_recompiles_many_args_param_non_tensor_arg_list_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_autograd_function_tangent_mutation_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_call_fn_with_non_const_inputs_aot_unsafe_control_flow_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_data_ptr_access_copy_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_data_ptr_access_fails_in_forward_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer3_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_donated_buffer6_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_inputs_overlapping_with_mutation_recompile_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_inputs_overlapping_with_mutation_stress_dynamic_shapes, test/dynamo/test_dynamic_shapes.py::DynamicShapesTestSDPA::test_input_SDPAParams_dynamic_shapes 2025-08-14T22:13:52.9384457Z 2025-08-14T22:13:52.9384787Z Running inductor/test_compiled_autograd 3/3 ... [2025-08-14 22:13:52.864008] 2025-08-14T22:13:52.9385390Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:13:52.9386822Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '-m', 'not serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:13:52.864341] 2025-08-14T22:13:54.0084037Z 2025-08-14T22:13:54.0085558Z inductor/test_mkldnn_pattern_matcher 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mkldnn_pattern_matcher_2.2_5c100d6773acfc34_.log 2025-08-14T22:13:54.0216803Z Running 135 items in this shard: test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_add_scalar, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_failed_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_pass_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_add_bias, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_binary, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_dynamic_fp16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_fp32, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_input_non_contiguous_3D_wo_bias, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_multi_linear_share_same_input, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add_relu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_hardswish, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_hardtanh, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_silu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv1d_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_broadcast_shapes_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_relu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_True_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_False_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_False_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_xpu_use_relu_False_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_dynamic_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_fp8_inductor_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_mul, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_113440_issue_1, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_113440_issue_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_99842_issue, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int4_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int8, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_linear_unary_dynamic_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_qat_bn_conv2d 2025-08-14T22:13:54.0344768Z 2025-08-14T22:13:54.0345154Z Running inductor/test_control_flow 1/2 ... [2025-08-14 22:13:54.008953] 2025-08-14T22:13:54.0345907Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:13:54.0347757Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '-m', 'not serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:13:54.009637] 2025-08-14T22:21:26.5400989Z 2025-08-14T22:21:26.5402287Z inductor/test_compiled_autograd 3/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_3.3_891cab9bd273fb94_.log 2025-08-14T22:21:26.5502144Z Running 295 items in this shard: test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_accuracy, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_5_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_2_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_aot_bwd_gm_runnable, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_id_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_basic_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_float_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_float_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_backward_hook_relative_ordering_partial, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_callback_graph_break_throws_error, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_python_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_sdpa, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_dynamically_defined_class, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_output_metadata, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_attr, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_multiple_tensors_dedup, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_with_same_graph, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dont_dce_side_effects, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_eager_node, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamo_boxed, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_free_activation_memory, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_free_activation_memory_subclass, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_higher_order_gradients, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_keep_graph_simple, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_keep_graph_usage_after_compiled, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_logs, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_mismatch_fake_tensor_mode, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_mismatch_fake_tensor_mode_dynamic_shape, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_nested_compile, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_no_output_nodes_all_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_no_output_nodes_some_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_output_nodes_all_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_acc_grad, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_post_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_post_hook1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_post_hook3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_subclass_basic, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_auto_functionalized, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_auto_functionalized_v2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_run_with_rng_state, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_cpp, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_snapshot, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_tensor_reference, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_multiple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_create_graph_warns, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_retained_graph_without_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_calculate_shape_util, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_adds_callback, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_propagates_errors_from_device_thread, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_non_reentrant_autocast_gpu, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_arbitrary_input_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_parameter_used_in_an_out, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_graph_task_execution_order, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_graph_task_id, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_forward_is_no_op, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_inplace_on_non_default_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_inplace_on_view_of_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_return_view_in_nograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_setup_context_simple, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_deep_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dep_nograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dependent_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_base, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_then_inplace_raises_in_autograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_diagonal_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dir, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks_nested, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dont_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_enable_grad_decorator_no_paren, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_first_grad_fn_access_in_no_grad_mode, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_function_returns_undefined_tensor, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_input_metadata, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks_remove_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_materialize, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_forward_or_backward_only, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_no_differentiable_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_dense_and_sparse_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_get_analytical_jacobian, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout4, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_edge_case_when_called_with_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hooks_cpp, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_saved_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_weak_grad_fn, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_input_buffer_accum, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_integer_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_isolated_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_legacy_function_deprecation_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_grad_all_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_named_tensor_for_complex_views, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_anomaly_access, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_autograd_function_stashing_ctx, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_nested_anomaly_detect_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_modifies_version, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_requires_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_unnecessary_save, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_unnecessary_unwrapping, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_node_ordering_when_none_returned, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_node_post_hook_registered_during_unpack_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_once_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_e2e, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_power_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_lstm, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_table, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pynode_destruction_deadlock, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function_multithreaded, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_child_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_depth_0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_depth_1, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_leaf_variable_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_requires_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retains_grad_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_custom_error_propagation, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_extra_exit_during_bw_no_crash, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensors_hook_version_counter_not_shared, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_default_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saving_variable_to_disk, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_scalar_grad_mixed_device, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_select_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_data_tensorimpl_type, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines_exit, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_enabled, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setitem, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setting_default_saved_variable_hooks_twice_should_not_fail, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setup_context_when_forward_has_default_args, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_simple_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_slice_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_both_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim_neg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_var_mean_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_will_engine_execute_node, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_False, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op_with_meta, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_autograd_function_backed_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_invalid_keys, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_requires_keys_for_input_optional_tensors, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_requires_keys_for_input_tensors, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_with_key_key_Autograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_tensorlist, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_type, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_partially_registered, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_tensorlist_input_requires_list_grads_none_or_Tensor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_builtin_aten_ops_are_pt2_compliant, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_basic, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_functionalize_error, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_abstract_overload, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_cuda, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_function, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_invalid_devices, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_no_return, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_supported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_define, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_lifetime, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_meta_for_data_dependent_shape_operation, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_name_must_match, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_cea, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_reserved_ns, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_resolve_packet, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_sequences, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_unsupported_param_types, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_global_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_numpy_number, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_free_variable_in_both_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_pytree_operands, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_with_constant_pred, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_dynamic_shapes_over_vmap_batch_size, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_enum_arg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fallback_on_graph_break_complicated, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fn_with_kwargs_in_torch_ops, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_freevars_as_inputs_to_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_grad_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_incorrect_type, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_pytree_inputs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_inlined_functions, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensor_constant, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensors_with_compound_expressions, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_make_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_example_value_metadata_consistent_with_eager, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_side_effect, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_subgraph_name_is_valid, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_tuple_output, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_no_freevars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_output_with_dict, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_vars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_same_freevar_twice, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_in_body, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_list, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_nonlocal_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_symint_input, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_tensor_and_unbacked_symbol_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_unbacked_symbol_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_multiply_scalar, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_kwargs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_functional_call, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_capture_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_closure_scalar, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_over_grad, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_two_tensor_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_two_tensors_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_two_tensors_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_simple, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_two_tensors_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_new_tensor_unused_in_body, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_different_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_same_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects_append_input, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs_tuple_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break_lambda, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_cond_with_kwargs, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_contiguous_dtensor_noncontiguous_local_as_tangent, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dont_recompile_on_same_placement_devicemesh, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamo_device_mesh_attrs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_graph_output, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs_forward_hook, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_tp_compile_comm_reordering, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_tp_compile_comm_reordering_graph_partition, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_flex_attention_backward_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_flex_attention_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_subgraph_simple_cuda_float32 2025-08-14T22:21:26.5604429Z 2025-08-14T22:21:26.5604578Z Running inductor/test_halide 1/1 ... [2025-08-14 22:21:26.540838] 2025-08-14T22:21:26.5604885Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:21:26.5605641Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_halide.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:21:26.541185] 2025-08-14T22:21:32.4030343Z 2025-08-14T22:21:32.4031990Z inductor/test_halide 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_halide_1.1_df953a26826921f6_.log 2025-08-14T22:21:32.4032966Z 2025-08-14T22:21:32.4033298Z Running inductor/test_flex_decoding 4/4 ... [2025-08-14 22:21:32.402403] 2025-08-14T22:21:32.4033937Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:21:32.4035595Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_decoding.py', '-m', 'not serial', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:21:32.402674] 2025-08-14T22:24:05.7690708Z 2025-08-14T22:24:05.7694288Z inductor/test_control_flow 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_1.2_840159345121bdaa_.log 2025-08-14T22:24:05.7792008Z Running 206 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_aliasing_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_inductor_fx_passes_recursively_applied, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_reintepret_view_inputs_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_simple_with_int_closure_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_outer_to_inner_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_infinite_loop_error, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_True, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_device_cuda, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cpu_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_2, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cuda_dynamic_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True 2025-08-14T22:24:05.7896091Z 2025-08-14T22:24:05.7896301Z Running inductor/test_torchbind 1/1 ... [2025-08-14 22:24:05.769338] 2025-08-14T22:24:05.7896709Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:24:05.7897691Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchbind.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:24:05.769639] 2025-08-14T22:24:43.6641179Z 2025-08-14T22:24:43.6642513Z inductor/test_torchbind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchbind_1.1_9d37c75f51290304_.log 2025-08-14T22:24:43.6652041Z Running 16 items in this shard: test/inductor/test_torchbind.py::TestTorchbind::test_aoti_torchbind_name_collision, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_aot_compile, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_aot_compile_constant_folding, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_aoti, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_compile, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_compile_gpu_op_symint_graph_partition, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_compile_symint, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_config_not_generated, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_get_buf_bytes, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_hop_schema, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_hop_schema_no_input, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_hop_schema_no_output, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_inductor, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_input_aot_compile, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_list_return_aot_compile, test/inductor/test_torchbind.py::TestTorchbind::test_torchbind_queue 2025-08-14T22:24:43.6660732Z 2025-08-14T22:24:43.6661043Z Running export/test_export 1/1 ... [2025-08-14 22:24:43.663850] 2025-08-14T22:24:43.6662338Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:24:43.6664465Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_export.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:24:43.664366] 2025-08-14T22:26:10.1204869Z 2025-08-14T22:26:10.1206036Z export/test_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_1.1_837eac7a85978c19_.log 2025-08-14T22:26:10.1357049Z Running 435 items in this shard: test/export/test_export.py::TestDynamismExpression::test_export_assume_static_by_default, test/export/test_export.py::TestDynamismExpression::test_export_constraints_error, test/export/test_export.py::TestDynamismExpression::test_export_constraints_error_not_in_range, test/export/test_export.py::TestDynamismExpression::test_export_inline_constraints, test/export/test_export.py::TestDynamismExpression::test_export_slice_maxsize, test/export/test_export.py::TestDynamismExpression::test_export_slice_unbacked_dim1, test/export/test_export.py::TestDynamismExpression::test_export_strict_narrow_unbacked_expr, test/export/test_export.py::TestDynamismExpression::test_no_grad_param_inplace, test/export/test_export.py::TestDynamismExpression::test_reshape_view_backed_size_oblivious, test/export/test_export.py::TestExport::test__scaled_dot_product_flash_attention, test/export/test_export.py::TestExport::test_additional_inputs_constants, test/export/test_export.py::TestExport::test_allow_explicit_guards_as_runtime_asserts, test/export/test_export.py::TestExport::test_args_type_checked, test/export/test_export.py::TestExport::test_aten_lift_fresh_copy, test/export/test_export.py::TestExport::test_attention, test/export/test_export.py::TestExport::test_attr_assignment_extra, test/export/test_export.py::TestExport::test_automatic_constrain_size, test/export/test_export.py::TestExport::test_automatic_dynamic_shapes_constant_relation, test/export/test_export.py::TestExport::test_automatic_dynamic_shapes_linear_relation, test/export/test_export.py::TestExport::test_automatic_dynamic_shapes_simple_equality, test/export/test_export.py::TestExport::test_baddbmm, test/export/test_export.py::TestExport::test_basic, test/export/test_export.py::TestExport::test_basic_non_strict_fake_tensor, test/export/test_export.py::TestExport::test_basic_non_strict_real_tensor, test/export/test_export.py::TestExport::test_bincount, test/export/test_export.py::TestExport::test_buffer_util, test/export/test_export.py::TestExport::test_capture_subclass_constructor, test/export/test_export.py::TestExport::test_capture_subclass_constructor_torch_ir, test/export/test_export.py::TestExport::test_capture_subclass_wrong, test/export/test_export.py::TestExport::test_ccode_python_mod, test/export/test_export.py::TestExport::test_check_specialized_int, test/export/test_export.py::TestExport::test_checks_to_constrain_range, test/export/test_export.py::TestExport::test_cleanup_dynamic_markers, test/export/test_export.py::TestExport::test_colin_unbacked_backed_vr_sub, test/export/test_export.py::TestExport::test_colon_parameter, test/export/test_export.py::TestExport::test_compiling_state, test/export/test_export.py::TestExport::test_cond_access_identical_symint_closure, test/export/test_export.py::TestExport::test_cond_branches_return_constant_int, test/export/test_export.py::TestExport::test_cond_branches_return_same_int, test/export/test_export.py::TestExport::test_cond_buffers, test/export/test_export.py::TestExport::test_cond_contains_unbacked_no_escape, test/export/test_export.py::TestExport::test_cond_int_closure, test/export/test_export.py::TestExport::test_cond_unflatten, test/export/test_export.py::TestExport::test_cond_with_module_stack_export_with, test/export/test_export.py::TestExport::test_cond_with_module_stack_export_with_unflatten, test/export/test_export.py::TestExport::test_constant_aliasing, test/export/test_export.py::TestExport::test_constant_input_naming, test/export/test_export.py::TestExport::test_constant_no_user_inp, test/export/test_export.py::TestExport::test_constant_output, test/export/test_export.py::TestExport::test_constant_output_dup, test/export/test_export.py::TestExport::test_constant_requires_grad_const, test/export/test_export.py::TestExport::test_constant_return, test/export/test_export.py::TestExport::test_constant_tensor_mutation, test/export/test_export.py::TestExport::test_constant_tensor_with_non_functional, test/export/test_export.py::TestExport::test_constant_tensor_with_non_functional_nested, test/export/test_export.py::TestExport::test_constrain_decomp, test/export/test_export.py::TestExport::test_constrain_size_in_eager, test/export/test_export.py::TestExport::test_constrain_size_with_constrain_value, test/export/test_export.py::TestExport::test_constrain_size_with_various_cases, test/export/test_export.py::TestExport::test_conv_dynamic, test/export/test_export.py::TestExport::test_crop_like, test/export/test_export.py::TestExport::test_cse_for_symint, test/export/test_export.py::TestExport::test_custom_op_auto_functionalize, test/export/test_export.py::TestExport::test_custom_op_auto_functionalize_pre_dispatch, test/export/test_export.py::TestExport::test_custom_op_auto_warn_pre_dispatch, test/export/test_export.py::TestExport::test_custom_op_preserve, test/export/test_export.py::TestExport::test_custom_pytree, test/export/test_export.py::TestExport::test_custom_tag_metadata_re_export, test/export/test_export.py::TestExport::test_decomp_batch_norm_functional_predispatch, test/export/test_export.py::TestExport::test_decomp_item_in_prim_after_decomposition, test/export/test_export.py::TestExport::test_decomp_item_in_prim_before_decomposition, test/export/test_export.py::TestExport::test_default_decomposition_core_cia_ops, test/export/test_export.py::TestExport::test_derived_dim_1_2, test/export/test_export.py::TestExport::test_derived_dim_basic, test/export/test_export.py::TestExport::test_derived_dim_integer, test/export/test_export.py::TestExport::test_derived_dim_nested, test/export/test_export.py::TestExport::test_derived_dim_out_of_order, test/export/test_export.py::TestExport::test_derived_dim_out_of_order_repeat_derived, test/export/test_export.py::TestExport::test_derived_dim_out_of_order_simplified, test/export/test_export.py::TestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived, test/export/test_export.py::TestExport::test_derived_dim_repeat_derived, test/export/test_export.py::TestExport::test_detect_leak_strict, test/export/test_export.py::TestExport::test_device_to_dynamic, test/export/test_export.py::TestExport::test_device_to_gpu, test/export/test_export.py::TestExport::test_device_to_mutation, test/export/test_export.py::TestExport::test_device_to_mutation_float, test/export/test_export.py::TestExport::test_device_to_static, test/export/test_export.py::TestExport::test_dim_1_2, test/export/test_export.py::TestExport::test_dim_auto_and_dim, test/export/test_export.py::TestExport::test_dim_dynamic, test/export/test_export.py::TestExport::test_dim_dynamic_divisibility, test/export/test_export.py::TestExport::test_dim_dynamic_specialization, test/export/test_export.py::TestExport::test_dim_hint_range_violations, test/export/test_export.py::TestExport::test_dim_hint_ranges, test/export/test_export.py::TestExport::test_disable_forced_specializations_errors, test/export/test_export.py::TestExport::test_disable_forced_specializations_ok, test/export/test_export.py::TestExport::test_distributed_all_gather, test/export/test_export.py::TestExport::test_distributed_all_gather_into_tensor, test/export/test_export.py::TestExport::test_distributed_all_reduce, test/export/test_export.py::TestExport::test_distributed_all_to_all_single, test/export/test_export.py::TestExport::test_distributed_reduce_scatter_tensor, test/export/test_export.py::TestExport::test_dont_duck_size_for_auto_dynamic, test/export/test_export.py::TestExport::test_double_lifted_constants, test/export/test_export.py::TestExport::test_draft_export_checks_aliasing, test/export/test_export.py::TestExport::test_draft_export_checks_mutation, test/export/test_export.py::TestExport::test_draft_export_checks_mutation_list, test/export/test_export.py::TestExport::test_draft_export_checks_mutation_with_nan, test/export/test_export.py::TestExport::test_draft_export_fake_kernel_inference_errors, test/export/test_export.py::TestExport::test_draft_export_infers_fake_kernel, test/export/test_export.py::TestExport::test_duplicate_modules_with_non_persistent_buffers, test/export/test_export.py::TestExport::test_dynamic_lr_shift, test/export/test_export.py::TestExport::test_dynamic_shapes_bounds, test/export/test_export.py::TestExport::test_dynamic_shapes_builder_basic, test/export/test_export.py::TestExport::test_dynamic_shapes_builder_kwargs, test/export/test_export.py::TestExport::test_dynamic_shapes_builder_pytree, test/export/test_export.py::TestExport::test_dynamic_shapes_dataclass, test/export/test_export.py::TestExport::test_dynamic_shapes_inferred_basic, test/export/test_export.py::TestExport::test_dynamic_shapes_serdes_generic, test/export/test_export.py::TestExport::test_dynamic_shapes_serdes_user_errors, test/export/test_export.py::TestExport::test_dynamic_shapes_serdes_various, test/export/test_export.py::TestExport::test_dynamic_shapes_spec_with_pytree, test/export/test_export.py::TestExport::test_dynamic_sym_round, test/export/test_export.py::TestExport::test_ends_of_bounds_oblivious, test/export/test_export.py::TestExport::test_error_does_not_reference_eager_fallback, test/export/test_export.py::TestExport::test_error_when_passing_mutating_primitive_op, test/export/test_export.py::TestExport::test_exception, test/export/test_export.py::TestExport::test_export_api_with_dynamic_shapes, test/export/test_export.py::TestExport::test_export_as_backend, test/export/test_export.py::TestExport::test_export_associative_scan_lifted_buffers, test/export/test_export.py::TestExport::test_export_associative_scan_symbol_dim, test/export/test_export.py::TestExport::test_export_associative_scan_symbol_scandim, test/export/test_export.py::TestExport::test_export_aten_to_unflatten, test/export/test_export.py::TestExport::test_export_aten_to_unflatten_subclass, test/export/test_export.py::TestExport::test_export_aten_to_unflatten_subclass_pre_dispatch, test/export/test_export.py::TestExport::test_export_cond_preserve_torch_fn_for_subgraphs, test/export/test_export.py::TestExport::test_export_cond_symbool_pred, test/export/test_export.py::TestExport::test_export_cond_warns_constant_pred, test/export/test_export.py::TestExport::test_export_custom_decomp_table_basic_pop, test/export/test_export.py::TestExport::test_export_custom_decomp_table_container_methods, test/export/test_export.py::TestExport::test_export_custom_op_lib, test/export/test_export.py::TestExport::test_export_custom_triton_kernel, test/export/test_export.py::TestExport::test_export_custom_triton_kernel_mutable, test/export/test_export.py::TestExport::test_export_decomp_torture_case_1, test/export/test_export.py::TestExport::test_export_decomp_torture_case_2, test/export/test_export.py::TestExport::test_export_decomps_dynamic, test/export/test_export.py::TestExport::test_export_decomps_simple, test/export/test_export.py::TestExport::test_export_dynamo_config, test/export/test_export.py::TestExport::test_export_for_training_run_decomp, test/export/test_export.py::TestExport::test_export_for_training_with_container_type, test/export/test_export.py::TestExport::test_export_for_training_with_dynamic_shapes, test/export/test_export.py::TestExport::test_export_for_training_with_mutation, test/export/test_export.py::TestExport::test_export_for_training_with_state_dict_hooks, test/export/test_export.py::TestExport::test_export_func_with_default_kwargs, test/export/test_export.py::TestExport::test_export_func_with_keyword_only_args, test/export/test_export.py::TestExport::test_export_func_with_kwargs, test/export/test_export.py::TestExport::test_export_func_with_pytree_kwargs, test/export/test_export.py::TestExport::test_export_func_with_var_keyword_args, test/export/test_export.py::TestExport::test_export_func_with_var_keyword_pytree_args, test/export/test_export.py::TestExport::test_export_func_with_var_postional_args, test/export/test_export.py::TestExport::test_export_function_schema, test/export/test_export.py::TestExport::test_export_graph_with_no_inputs, test/export/test_export.py::TestExport::test_export_input_mutation_bug, test/export/test_export.py::TestExport::test_export_input_mutation_dynamic_shape, test/export/test_export.py::TestExport::test_export_input_mutation_static_shape, test/export/test_export.py::TestExport::test_export_linear_preserve_dynamic_shape, test/export/test_export.py::TestExport::test_export_max_nonstrict, test/export/test_export.py::TestExport::test_export_max_onnx_reported, test/export/test_export.py::TestExport::test_export_method, test/export/test_export.py::TestExport::test_export_mod_constraints, test/export/test_export.py::TestExport::test_export_module, test/export/test_export.py::TestExport::test_export_preserve_linear_at_aot_level, test/export/test_export.py::TestExport::test_export_preserve_linear_but_not_custom_op, test/export/test_export.py::TestExport::test_export_scan_pytree_output, test/export/test_export.py::TestExport::test_export_script_module, test/export/test_export.py::TestExport::test_export_statically_known_true, test/export/test_export.py::TestExport::test_export_then_compile_tensor_ctor, test/export/test_export.py::TestExport::test_export_with_autocast, test/export/test_export.py::TestExport::test_export_with_fake_tensor_inputs, test/export/test_export.py::TestExport::test_export_with_fake_tensor_inputs_on_cuda_devices, test/export/test_export.py::TestExport::test_export_with_inline_constraints, test/export/test_export.py::TestExport::test_export_with_inline_constraints_complex, test/export/test_export.py::TestExport::test_export_with_set_grad_enabled, test/export/test_export.py::TestExport::test_export_with_wrong_inputs, test/export/test_export.py::TestExport::test_external_call_non_strict_real_tensor, test/export/test_export.py::TestExport::test_fake_inputs, test/export/test_export.py::TestExport::test_fake_weights, test/export/test_export.py::TestExport::test_filter_traceback_frames, test/export/test_export.py::TestExport::test_float_conversion, test/export/test_export.py::TestExport::test_float_conversion_from_int, test/export/test_export.py::TestExport::test_fqn, test/export/test_export.py::TestExport::test_from_node_metadata_export, test/export/test_export.py::TestExport::test_full_on_scalar_tensor, test/export/test_export.py::TestExport::test_hints_wrapper, test/export/test_export.py::TestExport::test_hoo_inline_users_issue, test/export/test_export.py::TestExport::test_if_functional, test/export/test_export.py::TestExport::test_if_post_autograd_op_preserved, test/export/test_export.py::TestExport::test_inline_script_class_method, test/export/test_export.py::TestExport::test_inline_script_class_method_recursive, test/export/test_export.py::TestExport::test_inline_script_function, test/export/test_export.py::TestExport::test_inline_script_method, test/export/test_export.py::TestExport::test_int_shape_specialization, test/export/test_export.py::TestExport::test_intermediate_shape_comp, test/export/test_export.py::TestExport::test_is_exporting, test/export/test_export.py::TestExport::test_is_non_negative_check_function, test/export/test_export.py::TestExport::test_is_nonzero, test/export/test_export.py::TestExport::test_isnonzero, test/export/test_export.py::TestExport::test_issue_113041, test/export/test_export.py::TestExport::test_issue_157289, test/export/test_export.py::TestExport::test_istft_op, test/export/test_export.py::TestExport::test_keep_composite_ops_invalid, test/export/test_export.py::TestExport::test_keep_composite_ops_linear_convd, test/export/test_export.py::TestExport::test_keep_composite_ops_linear_convd_for_training_ir, test/export/test_export.py::TestExport::test_kwarg_dynamic_shapes_diff_order, test/export/test_export.py::TestExport::test_kwargs_reorder, test/export/test_export.py::TestExport::test_layer_sharing, test/export/test_export.py::TestExport::test_lazy_module_kwargs, test/export/test_export.py::TestExport::test_lifted_constants, test/export/test_export.py::TestExport::test_linear_conv, test/export/test_export.py::TestExport::test_malformed_fqn_from_source_name, test/export/test_export.py::TestExport::test_map, test/export/test_export.py::TestExport::test_map_buffers, test/export/test_export.py::TestExport::test_mask_nonzero_static, test/export/test_export.py::TestExport::test_masked_select_dynamic, test/export/test_export.py::TestExport::test_math_pow, test/export/test_export.py::TestExport::test_mismatched_dynamic_shapes, test/export/test_export.py::TestExport::test_mixed_input, test/export/test_export.py::TestExport::test_module, test/export/test_export.py::TestExport::test_module_dict_key, test/export/test_export.py::TestExport::test_module_input, test/export/test_export.py::TestExport::test_module_input_subclasses_parameterization_nested, test/export/test_export.py::TestExport::test_module_list_slice, test/export/test_export.py::TestExport::test_module_with_dict_container_inp_out, test/export/test_export.py::TestExport::test_modules_access_for_deleted_submodule, test/export/test_export.py::TestExport::test_more_multidimensional_slicing, test/export/test_export.py::TestExport::test_multidimensional_slicing, test/export/test_export.py::TestExport::test_multinomial_dynamic, test/export/test_export.py::TestExport::test_multiple_definitions_same_name_dim, test/export/test_export.py::TestExport::test_nested_dynamic_shapes_spec, test/export/test_export.py::TestExport::test_nested_module, test/export/test_export.py::TestExport::test_nested_module_with_constant_buffer, test/export/test_export.py::TestExport::test_nested_module_with_init_buffer, test/export/test_export.py::TestExport::test_nested_module_with_parameter, test/export/test_export.py::TestExport::test_nn_module_stack, test/export/test_export.py::TestExport::test_nn_module_stack_shared_submodule, test/export/test_export.py::TestExport::test_no_check_is_size_error, test/export/test_export.py::TestExport::test_no_suggested_fixes_for_data_dependent_errors, test/export/test_export.py::TestExport::test_no_tensor_computation, test/export/test_export.py::TestExport::test_no_tensor_computation_2, test/export/test_export.py::TestExport::test_no_tensor_computation_3, test/export/test_export.py::TestExport::test_no_tensor_computation_4, test/export/test_export.py::TestExport::test_non_arg_name_dynamic_shapes_api, test/export/test_export.py::TestExport::test_non_arg_name_dynamic_shapes_api_with_container_type, test/export/test_export.py::TestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg, test/export/test_export.py::TestExport::test_non_persistent_buffer, test/export/test_export.py::TestExport::test_non_strict_dynamic_shapes, test/export/test_export.py::TestExport::test_non_strict_dynamic_shapes_suggested_fixes, test/export/test_export.py::TestExport::test_none_buffers, test/export/test_export.py::TestExport::test_nonstrict_retrace_preserves_metadata, test/export/test_export.py::TestExport::test_nonzero_2, test/export/test_export.py::TestExport::test_nonzero_dynamic, test/export/test_export.py::TestExport::test_not_registered_parameter, test/export/test_export.py::TestExport::test_operator_aten_tensor_mode_variant, test/export/test_export.py::TestExport::test_output_node_name, test/export/test_export.py::TestExport::test_pad_sequence, test/export/test_export.py::TestExport::test_param_util, test/export/test_export.py::TestExport::test_partial_patched_forward, test/export/test_export.py::TestExport::test_placeholder_naming_collisions, test/export/test_export.py::TestExport::test_placeholder_naming_collisions_hoo_subgraphs, test/export/test_export.py::TestExport::test_placeholder_naming_order, test/export/test_export.py::TestExport::test_placeholder_naming_order_variadic, test/export/test_export.py::TestExport::test_placeholder_update_preserving, test/export/test_export.py::TestExport::test_predispatch_cond, test/export/test_export.py::TestExport::test_predispatch_grad_wrappers, test/export/test_export.py::TestExport::test_preserve_module_call_signature_unflatten_specialization, test/export/test_export.py::TestExport::test_preserve_requires_grad_placeholders, test/export/test_export.py::TestExport::test_preserve_shape_dynamism_for_unused_inputs, test/export/test_export.py::TestExport::test_profiling_code, test/export/test_export.py::TestExport::test_python_asserts_with_sym_int, test/export/test_export.py::TestExport::test_pytree_register_data_class, test/export/test_export.py::TestExport::test_pytree_register_nested_data_class, test/export/test_export.py::TestExport::test_raise_user_error_when_guard_on_data_dependent_operation, test/export/test_export.py::TestExport::test_range_constraints_with_replacement, test/export/test_export.py::TestExport::test_real_tensor_alias_dtype_mismatch, test/export/test_export.py::TestExport::test_real_tensor_bool_cast, test/export/test_export.py::TestExport::test_real_tensor_errors_on_aliasing_custom_op, test/export/test_export.py::TestExport::test_real_tensor_for_max_op, test/export/test_export.py::TestExport::test_real_tensor_size_mismatch, test/export/test_export.py::TestExport::test_redundant_assert_max_upper_bound, test/export/test_export.py::TestExport::test_redundant_asserts, test/export/test_export.py::TestExport::test_refine_dynamic_shapes_from_suggested_fixes, test/export/test_export.py::TestExport::test_register_constant, test/export/test_export.py::TestExport::test_repeat_interleave, test/export/test_export.py::TestExport::test_replace_unbacked_with_very_large_upperbound, test/export/test_export.py::TestExport::test_replaced_unbacked_bindings, test/export/test_export.py::TestExport::test_reshape_view_helper, test/export/test_export.py::TestExport::test_retracable_ep, test/export/test_export.py::TestExport::test_retrace_pre_autograd, test/export/test_export.py::TestExport::test_run_decomposition_supports_user_input_mutation, test/export/test_export.py::TestExport::test_run_decompositions_keep_metadata, test/export/test_export.py::TestExport::test_run_decompositions_keep_tensor_constant_metadata, test/export/test_export.py::TestExport::test_runtime_assert_for_prim, test/export/test_export.py::TestExport::test_runtime_assert_for_prm_str, test/export/test_export.py::TestExport::test_runtime_assert_with_size, test/export/test_export.py::TestExport::test_sdpa_gqa, test/export/test_export.py::TestExport::test_sequential_slicing, test/export/test_export.py::TestExport::test_set_example_inputs, test/export/test_export.py::TestExport::test_set_grad_as_side_effect, test/export/test_export.py::TestExport::test_set_grad_empty, test/export/test_export.py::TestExport::test_set_grad_unflatten, test/export/test_export.py::TestExport::test_setgrad_lifted_tensor, test/export/test_export.py::TestExport::test_shared_submodule_nn_module_stack, test/export/test_export.py::TestExport::test_simple_export_for_training, test/export/test_export.py::TestExport::test_simple_unbacked_view, test/export/test_export.py::TestExport::test_size_input, test/export/test_export.py::TestExport::test_slice_nn_module_stack, test/export/test_export.py::TestExport::test_solver_unsupported_sympy_function, test/export/test_export.py::TestExport::test_specialize_derived_dim_roots, test/export/test_export.py::TestExport::test_split_const_gm_with_lifted_constants, test/export/test_export.py::TestExport::test_stack_trace, test/export/test_export.py::TestExport::test_stack_trace_make_fx, test/export/test_export.py::TestExport::test_state_primitives, test/export/test_export.py::TestExport::test_state_shape_attribute_assignment, test/export/test_export.py::TestExport::test_state_tensors, test/export/test_export.py::TestExport::test_static_dim_constraints, test/export/test_export.py::TestExport::test_subclass_nested_attr_access, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_complicated_metadata, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_const_metadata, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_const_metadata_not_top_level, test/export/test_export.py::TestExport::test_subclass_nested_attr_access_submodule, test/export/test_export.py::TestExport::test_subclasses_parameterization, test/export/test_export.py::TestExport::test_subclasses_parameterization_nested, test/export/test_export.py::TestExport::test_suggest_torch_checks_with_non_negative_check, test/export/test_export.py::TestExport::test_suggest_torch_checks_with_regular_check, test/export/test_export.py::TestExport::test_suggested_fixes_for_data_dependent_errors_basic, test/export/test_export.py::TestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers, test/export/test_export.py::TestExport::test_suggested_fixes_new_roots, test/export/test_export.py::TestExport::test_sym_float_operators, test/export/test_export.py::TestExport::test_sym_or_sym_and, test/export/test_export.py::TestExport::test_sym_sqrt, test/export/test_export.py::TestExport::test_symbool_item, test/export/test_export.py::TestExport::test_symfloat_item, test/export/test_export.py::TestExport::test_symint_input_additional_inputs, test/export/test_export.py::TestExport::test_symint_input_basic, test/export/test_export.py::TestExport::test_symint_input_ranges, test/export/test_export.py::TestExport::test_symint_input_shapes_collection, test/export/test_export.py::TestExport::test_symint_input_specialization, test/export/test_export.py::TestExport::test_symint_item, test/export/test_export.py::TestExport::test_symint_output, test/export/test_export.py::TestExport::test_symint_tensor_return, test/export/test_export.py::TestExport::test_tensor_attribute_zero_args, test/export/test_export.py::TestExport::test_tensor_constant_aten_to, test/export/test_export.py::TestExport::test_tensor_constant_with_wrapped_method, test/export/test_export.py::TestExport::test_to_module_with_mutated_buffer, test/export/test_export.py::TestExport::test_to_module_with_mutated_buffer_multiple, test/export/test_export.py::TestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later, test/export/test_export.py::TestExport::test_tolist, test/export/test_export.py::TestExport::test_torch_check_eq_commutativity, test/export/test_export.py::TestExport::test_torch_fn, test/export/test_export.py::TestExport::test_trace_under_fake, test/export/test_export.py::TestExport::test_train_eval_on_exported_preautograd_module, test/export/test_export.py::TestExport::test_unbacked_3d_matmul, test/export/test_export.py::TestExport::test_unbacked_bincount, test/export/test_export.py::TestExport::test_unbacked_bindings_for_divisible_u_symint, test/export/test_export.py::TestExport::test_unbacked_deferred_runtime_retrace, test/export/test_export.py::TestExport::test_unbacked_expand, test/export/test_export.py::TestExport::test_unbacked_infer_size, test/export/test_export.py::TestExport::test_unbacked_kth_value, test/export/test_export.py::TestExport::test_unbacked_linear_layer_norm_input, test/export/test_export.py::TestExport::test_unbacked_noncontig_lin, test/export/test_export.py::TestExport::test_unbacked_pad, test/export/test_export.py::TestExport::test_unbacked_scalar_constructor, test/export/test_export.py::TestExport::test_unbacked_slice, test/export/test_export.py::TestExport::test_unbacked_to_cond, test/export/test_export.py::TestExport::test_unbacked_to_cond_passthrough, test/export/test_export.py::TestExport::test_unbacked_unsqueeze, test/export/test_export.py::TestExport::test_unflatten_asserts, test/export/test_export.py::TestExport::test_unflatten_buffer_update_child2parent_swap, test/export/test_export.py::TestExport::test_unflatten_closure, test/export/test_export.py::TestExport::test_unflatten_isinstance, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_dispatch, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_preserve_signature_no_error, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_shared_submodule, test/export/test_export.py::TestExport::test_unflatten_multiple_graphs_state, test/export/test_export.py::TestExport::test_unflatten_no_unroll, test/export/test_export.py::TestExport::test_unflatten_placeholder_update_child2parent_swap, test/export/test_export.py::TestExport::test_unflatten_placeholder_update_grandchild2cousin_swap, test/export/test_export.py::TestExport::test_unflatten_random_dag_5, test/export/test_export.py::TestExport::test_unflatten_random_dag_6, test/export/test_export.py::TestExport::test_unflatten_random_dag_buf_8, test/export/test_export.py::TestExport::test_unflatten_random_dag_const_preserving_3, test/export/test_export.py::TestExport::test_unflatten_random_dag_const_preserving_3_1, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_4, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_6, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_9, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_10, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_4, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_5, test/export/test_export.py::TestExport::test_unflatten_random_dag_mutating_buf_preserving_7, test/export/test_export.py::TestExport::test_unflatten_random_dag_preserving_4, test/export/test_export.py::TestExport::test_unused_aliases, test/export/test_export.py::TestExport::test_unused_constant, test/export/test_export.py::TestExport::test_use_embedding_twice, test/export/test_export.py::TestExport::test_user_input_and_buffer_mutation, test/export/test_export.py::TestExport::test_while_loop_assert_separation, test/export/test_export.py::TestExport::test_while_loop_index_assertions, test/export/test_export.py::TestExport::test_while_loop_simple, test/export/test_export.py::TestExport::test_while_loop_tensor_constant_idx, test/export/test_export.py::TestExport::test_wrapper_module, test/export/test_export.py::TestOneOffModelExportResult::test_assert_tensor_metadata_device_index, test/export/test_export.py::TestOneOffModelExportResult::test_constant_fqn, test/export/test_export.py::TestOneOffModelExportResult::test_constant_name, test/export/test_export.py::TestOneOffModelExportResult::test_duplicated_getitem, test/export/test_export.py::TestOneOffModelExportResult::test_hf_logging_logger, test/export/test_export.py::TestOneOffModelExportResult::test_input_output_no_stacktrace, test/export/test_export.py::TestOneOffModelExportResult::test_int_list_output, test/export/test_export.py::TestOneOffModelExportResult::test_logging_logger, test/export/test_export.py::TestOneOffModelExportResult::test_nested_retrace, test/export/test_export.py::TestOneOffModelExportResult::test_none_input_output, test/export/test_export.py::TestOneOffModelExportResult::test_primitive_constant_output, test/export/test_export.py::TestOneOffModelExportResult::test_print, test/export/test_export.py::TestOneOffModelExportResult::test_print_graph_signature, test/export/test_export.py::TestOneOffModelExportResult::test_scaled_dot_product_attention_cpu, test/export/test_export.py::TestOneOffModelExportResult::test_scaled_dot_product_attention_cuda, test/export/test_export.py::TestOneOffModelExportResult::test_torchrec_jagged_tensor, test/export/test_export.py::TestOneOffModelExportResult::test_unbacked_sdpa, test/export/test_export.py::TestOneOffModelExportResult::test_warning, test/export/test_export.py::TestExportCustomClass::test_export_script_module, test/export/test_export.py::TestExportCustomClass::test_export_unbacked_lt, test/export/test_export.py::TestExportCustomClass::test_int_lift_constant, test/export/test_export.py::TestExportCustomClass::test_lift_custom_obj, test/export/test_export.py::TestExportCustomClass::test_preserve_cia_op, test/export/test_export.py::TestExportCustomClass::test_preserve_non_cia_op, test/export/test_export.py::TestExportCustomClass::test_unbacked_contiguous, test/export/test_export.py::TestExportCustomClass::test_unbacked_select_index 2025-08-14T22:26:10.1490244Z 2025-08-14T22:26:10.1490386Z Running inductor/test_fp8 1/1 ... [2025-08-14 22:26:10.121612] 2025-08-14T22:26:10.1490676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:26:10.1491420Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:26:10.122174] 2025-08-14T22:26:10.1492301Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T22:26:10.1492638Z Uploading artifacts took 0.00 seconds 2025-08-14T22:27:33.4059630Z 2025-08-14T22:27:33.4060818Z inductor/test_fp8 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fp8_1.1_40b260eb7ae8d26a_.log 2025-08-14T22:27:33.4210003Z Running 228 items in this shard: test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_bad_cast, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_bfloat16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_bfloat16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_float16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_eager_fallback_float16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_15,3,13_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_15,3,13_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_4,2048,4096_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float16_shape_4,2048,4096_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_15,3,13_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_15,3,13_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_4,2048,4096_dst_types0_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_valid_cast_float32_shape_4,2048,4096_dst_types0_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e4m3fn_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e4m3fn_device_cuda, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e5m2_device_cpu, test/inductor/test_fp8.py::TestFP8Types::test_xblock_for_small_numel_float8_e5m2_device_cuda, test/inductor/test_fp8.py::TestFP8Lowering::test_mx_fp8_max_autotune, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False, test/inductor/test_fp8.py::TestFP8Lowering::test_unacceptable_input_dims, test/inductor/test_fp8.py::TestFP8Lowering::test_unacceptable_scale_dims_rowwise_scaling 2025-08-14T22:27:33.4364176Z 2025-08-14T22:27:33.4364563Z Running inductor/test_debug_trace 1/1 ... [2025-08-14 22:27:33.406700] 2025-08-14T22:27:33.4365307Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:27:33.4367174Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:27:33.407307] 2025-08-14T22:27:48.7532382Z 2025-08-14T22:27:48.7533827Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_09636b3772f5587c_.log 2025-08-14T22:27:48.7535989Z Running 3 items in this shard: test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_multi_tempalte, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_printer_const, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_trace 2025-08-14T22:27:48.7536949Z 2025-08-14T22:27:48.7537108Z Running inductor/test_op_dtype_prop 2/2 ... [2025-08-14 22:27:48.752586] 2025-08-14T22:27:48.7537425Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:27:48.7539581Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_dtype_prop.py', '-m', 'not serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:27:48.753176] 2025-08-14T22:29:18.3690156Z 2025-08-14T22:29:18.3693599Z inductor/test_flex_decoding 4/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_decoding_4.4_df3c66d248695854_.log 2025-08-14T22:29:18.3777328Z Running 150 items in this shard: test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod2_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod5_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod6_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod8_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod0_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod3_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod4_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod4_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod6_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod6_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod7_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod0_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod3_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod5_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod5_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod6_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod7_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod8_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod2_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod5_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod5_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod6_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod8_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod6_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod1_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod3_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod4_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod4_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod5_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod5_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod6_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod6_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod7_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod8_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_function_composition_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod5_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod6_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod7_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod8_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_load_from_bias_head_seq_batch_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_load_from_bias_seq_batch_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_max_autotune_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_max_autotune_with_captured_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_njt_causal_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_njt_causal_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_multi_token_offset_mask_with_captured_buffer_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_sparse_mulitple_block_size_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_seq_masking_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_silu_on_score_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_skip_odd_keys_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_full_mask_vs_sdpa_paged_attention_cuda 2025-08-14T22:29:18.3881582Z 2025-08-14T22:29:18.3881940Z Running inductor/test_analysis 1/1 ... [2025-08-14 22:29:18.369295] 2025-08-14T22:29:18.3882694Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:29:18.3884639Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_analysis.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:29:18.369727] 2025-08-14T22:29:24.4986759Z 2025-08-14T22:29:24.4987837Z inductor/test_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_analysis_1.1_2f16e7810d0e8686_.log 2025-08-14T22:29:24.4996748Z Running 26 items in this shard: test/inductor/test_analysis.py::TestUtils::test_tabulate2d, test/inductor/test_analysis.py::TestUtils::test_zip_dicts, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_helper_unit_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float64, test/inductor/test_analysis.py::TestAnalysisCUDA::test_noop_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float64 2025-08-14T22:29:24.5006116Z 2025-08-14T22:29:24.5006298Z Running export/test_unflatten 1/1 ... [2025-08-14 22:29:24.498609] 2025-08-14T22:29:24.5006669Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:29:24.5007837Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:29:24.499267] 2025-08-14T22:29:42.8537429Z 2025-08-14T22:29:42.8542875Z export/test_unflatten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_1.1_908b6ff995f6e1fa_.log 2025-08-14T22:29:42.8557725Z Running 28 items in this shard: test/export/test_unflatten.py::TestUnflatten::test_assert_tensor_metadata_stack, test/export/test_unflatten.py::TestUnflatten::test_attr_as_submod_input, test/export/test_unflatten.py::TestUnflatten::test_dedup_sym_size, test/export/test_unflatten.py::TestUnflatten::test_double_nested_submodule, test/export/test_unflatten.py::TestUnflatten::test_duplicate_placeholder, test/export/test_unflatten.py::TestUnflatten::test_fx_trace, test/export/test_unflatten.py::TestUnflatten::test_nested_leaf_non_strict, test/export/test_unflatten.py::TestUnflatten::test_placeholder_and_get_attr_ordering_after_unflattened, test/export/test_unflatten.py::TestUnflatten::test_simple_alias, test/export/test_unflatten.py::TestUnflatten::test_unflatten_buffer_mutation, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_obj, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_tensor, test/export/test_unflatten.py::TestUnflatten::test_unflatten_container_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_eager, test/export/test_unflatten.py::TestUnflatten::test_unflatten_empty_branch, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested_access, test/export/test_unflatten.py::TestUnflatten::test_unflatten_none, test/export/test_unflatten.py::TestUnflatten::test_unflatten_param_list_dict, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_signature, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_with_unused_input, test/export/test_unflatten.py::TestUnflatten::test_unflatten_requires_grad_param, test/export/test_unflatten.py::TestUnflatten::test_unflatten_shared_submodule, test/export/test_unflatten.py::TestUnflatten::test_unflatten_skipped_call_module, test/export/test_unflatten.py::TestUnflatten::test_unflatten_submodule_ordering, test/export/test_unflatten.py::TestUnflatten::test_unflatten_with_inplace_compile, test/export/test_unflatten.py::TestUnflatten::test_unflatten_wrong_input, test/export/test_unflatten.py::TestUnflatten::test_unflattened_module_nodes_has_meta_val 2025-08-14T22:29:42.8572447Z 2025-08-14T22:29:42.8572774Z Running dynamo/test_interop 1/1 ... [2025-08-14 22:29:42.853693] 2025-08-14T22:29:42.8573460Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:29:42.8575173Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_interop.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:29:42.854339] 2025-08-14T22:29:47.1289876Z 2025-08-14T22:29:47.1290829Z dynamo/test_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_interop_1.1_f17e2e18cb8b8aee_.log 2025-08-14T22:29:47.1292530Z Running 5 items in this shard: test/dynamo/test_interop.py::InteropTests::test_fx_fn, test/dynamo/test_interop.py::InteropTests::test_script_fn, test/dynamo/test_interop.py::InteropTests::test_staticmethod_script_fn, test/dynamo/test_interop.py::InteropTests::test_trace_fn, test/dynamo/test_interop.py::InteropTests::test_vmap_in_graph 2025-08-14T22:29:47.1293705Z 2025-08-14T22:29:47.1294262Z Running inductor/test_quantization 1/1 ... [2025-08-14 22:29:47.128882] 2025-08-14T22:29:47.1294680Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:29:47.1299756Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_quantization.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:29:47.129473] 2025-08-14T22:30:03.0786234Z 2025-08-14T22:30:03.0787496Z inductor/test_quantization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_quantization_1.1_202ee78e2a2186fd_.log 2025-08-14T22:30:03.0789729Z Running 2 items in this shard: test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_with_scaling, test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_without_scaling 2025-08-14T22:30:03.0791156Z 2025-08-14T22:30:03.0791997Z Running inductor/test_gpu_cpp_wrapper 1/2 ... [2025-08-14 22:30:03.078521] 2025-08-14T22:30:03.0792689Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:30:03.0794376Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '-m', 'not serial', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:30:03.078878] 2025-08-14T22:32:36.6772854Z 2025-08-14T22:32:36.6774360Z inductor/test_op_dtype_prop 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_dtype_prop_2.2_361dadd6ca9640c3_.log 2025-08-14T22:32:36.7146474Z Running 273 items in this shard: test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_any_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_assoc_scan_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_binary_math_mixed_precision_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_codegen_upcast_to_fp32_upcast_to_fp32_False_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_codegen_upcast_to_fp32_upcast_to_fp32_True_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_constant_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_downcast_div_mod_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_abs_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acos_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_acos_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asin_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asin_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_asinh_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan2_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan2_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atan_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_atanh_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_ceil_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_ceil_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_copysign_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_copysign_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cos_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_cosh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erf_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erf_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erf_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfc_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_erfinv_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp2_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp2_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_exp_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_expm1_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_expm1_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_floor_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_floor_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_fmod_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_fmod_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_hypot_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_hypot_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_hypot_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isinf_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isinf_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isinf_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_isnan_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_lgamma_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_lgamma_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_lgamma_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log10_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log10_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log1p_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log2_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_log_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_nextafter_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_nextafter_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_pow_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_pow_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_pow_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_round_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_round_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_rsqrt_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_rsqrt_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sigmoid_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sin_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sinh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sqrt_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sqrt_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_sqrt_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tan_load_upcast_to_fp32_False_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tan_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_tanh_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_trunc_load_upcast_to_fp32_False_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_trunc_load_upcast_to_fp32_True_bfloat16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_dtype_aware_codegen_op_name_trunc_load_upcast_to_fp32_True_float16_cuda, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_abs_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acos_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acos_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acosh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_acosh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_add_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_add_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_angle_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asin_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asin_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_asinh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atan_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_atanh_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_left_shift_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_left_shift_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_not_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_or_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_or_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_xor_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_bitwise_xor_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ceil_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ceil_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_max_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_max_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clamp_min_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_clone_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_copysign_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cos_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cosh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_cosh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_digamma_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_no_rounding_mode_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_div_trunc_rounding_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_eq_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erf_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfc_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfinv_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_erfinv_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp2_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_exp_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_expm1_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gcd_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ge_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_ge_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gt_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_gt_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_hypot_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_i0_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_igamma_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_igammac_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_igammac_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isinf_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isinf_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isnan_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_isnan_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_le_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lgamma_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lgamma_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log1p_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log1p_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log2_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log2_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_log_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_and_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_and_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_not_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_not_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_or_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_or_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_logical_xor_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_lt_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_max_binary_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_maximum_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_min_binary_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_min_binary_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_minimum_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_minimum_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_mul_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_neg_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_neg_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_nextafter_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_0_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_0_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_1_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_2_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_3_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_polygamma_polygamma_n_4_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_pow_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_reciprocal_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_reciprocal_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_remainder_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_0_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_3_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_round_decimals_neg_3_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_rsqrt_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sigmoid_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sigmoid_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sign_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_signbit_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sin_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sinh_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sqrt_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_square_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_square_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sub_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sub_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_sub_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tan_cuda_int64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tanh_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_tanh_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_trunc_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_bool, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_float32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_float64, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_int32, test/inductor/test_op_dtype_prop.py::TestCaseCUDA::test_op_dtype_propagation_where_cuda_int64 2025-08-14T22:32:36.7467302Z 2025-08-14T22:32:36.7467677Z Running inductor/test_gpu_cpp_wrapper 2/2 ... [2025-08-14 22:32:36.677483] 2025-08-14T22:32:36.7468333Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:32:36.7469783Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '-m', 'not serial', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:32:36.677788] 2025-08-14T22:35:27.1766451Z 2025-08-14T22:35:27.1767702Z inductor/test_gpu_cpp_wrapper 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_1.2_c20e4dd7cb2329aa_.log 2025-08-14T22:35:27.1819737Z Running 150 items in this shard: test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_adding_tensor_offsets_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_batch_norm_2d_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_buffer_use_after_remove_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_slice_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_conv_backward_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_convolution1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_embedding_bag_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_real_output_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_tensor_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_layer_norm_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear_relu_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm2_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_views_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_device_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_h_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_he_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pow3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_reduction1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_repeat_interleave_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scalar_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_efficient_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_adding_tensor_offsets_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_as_strided_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_batch_norm_2d_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bernoulli1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bitwise_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_buffer_use_after_remove_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_slice_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_consecutive_split_cumprod_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_convolution1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_fusion_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_embedding_bag_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_real_output_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_put_deterministic_fallback_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_tensor_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_views_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_device_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_threading_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_h_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_reduction1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_relu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scalar_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_efficient_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sort_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_transpose_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_uint8_cuda_dynamic_shapes_gpu_wrapper 2025-08-14T22:35:27.1867293Z 2025-08-14T22:35:27.1867436Z Running test_quantization 5/9 ... [2025-08-14 22:35:27.177268] 2025-08-14T22:35:27.1867734Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:35:27.1868493Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=5', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:35:27.177750] 2025-08-14T22:37:59.5970259Z 2025-08-14T22:37:59.5972250Z inductor/test_gpu_cpp_wrapper 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_2.2_69a6b7de0e4a5cea_.log 2025-08-14T22:37:59.6025540Z Running 144 items in this shard: test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex4_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_addmm_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_aoti_debug_printer_works_on_constants, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_as_strided_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bernoulli1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bitwise_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_consecutive_split_cumprod_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_fusion_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_enable_dynamic_shapes_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_foreach_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_put_deterministic_fallback_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_inductor_layout_optimization_input_mutations_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_insignificant_strides_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm3_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_threading_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_profiler_mark_wrapper_call_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_randint_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_relu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_roi_align_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_silu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sort_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_dtype_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_int_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_transpose_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex4_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_addmm_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_annotation_training, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_conv_backward_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_enable_dynamic_shapes_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_foreach_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_inductor_layout_optimization_input_mutations_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_insignificant_strides_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_layer_norm_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear_relu_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm2_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm3_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_he_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pow3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_profiler_mark_wrapper_call_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_randint_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_repeat_interleave_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_roi_align_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_silu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_dtype_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_int_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int8_cuda_dynamic_shapes_gpu_wrapper 2025-08-14T22:37:59.6073157Z 2025-08-14T22:37:59.6103015Z Running test_quantization 8/9 ... [2025-08-14 22:37:59.597634] 2025-08-14T22:37:59.6103367Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:37:59.6104184Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_quantization.py', '-m', 'not serial', '--shard-id=8', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:37:59.604500] 2025-08-14T22:44:49.0069955Z 2025-08-14T22:44:49.0071808Z test_quantization 5/9 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_5.9_6b4d52b51b6277e3_.log 2025-08-14T22:44:49.0105276Z Running 141 items in this shard: test/test_quantization.py::TestQuantizedOps::test_avg_pool2d, test/test_quantization.py::TestQuantizedOps::test_avg_pool2d_nhwc, test/test_quantization.py::TestQuantizedOps::test_custom_module_lstm, test/test_quantization.py::TestQuantizedOps::test_max_pool2d_pt2e, test/test_quantization.py::TestQuantizedOps::test_qadd_relu_cudnn, test/test_quantization.py::TestQNNPackOps::test_qnnpack_add_broadcast, test/test_quantization.py::TestQNNPackOps::test_qnnpack_maxpool2d, test/test_quantization.py::TestQuantizedLinear::test_qlinear_gelu_fp8, test/test_quantization.py::TestQuantizedLinear::test_qlinear_relu_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_sum_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv_transpose2d, test/test_quantization.py::TestDynamicQuantizedOps::test_linear_dynamic_fp16_onednn, test/test_quantization.py::TestDynamicQuantizedOps::test_qlinear_legacy, test/test_quantization.py::TestDynamicQuantizedOps::test_unpacked_qlinear_dynamic_fp16_opcheck, test/test_quantization.py::TestDynamicQuantizedOps::test_wrapped_fbgemm_linear_fp16, test/test_quantization.py::TestComparatorOps::test_compare_tensor_scalar, test/test_quantization.py::TestPadding::test_constant_padNd, test/test_quantization.py::TestQuantizedFunctionalOps::test_grid_sample, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quant_per_channel_qparam_range, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quant_preserves_qparam_shapes_for_activations, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_channel_cpu, test/test_quantization.py::TestFakeQuantizeOps::test_numerical_consistency_per_tensor, test/test_quantization.py::TestFusedObsFakeQuant::test_fused_obs_fake_quant_moving_avg, test/test_quantization.py::TestQuantizedTensor::test_fp16_saturate_op, test/test_quantization.py::TestQuantizedTensor::test_per_tensor_to_device, test/test_quantization.py::TestQuantizedTensor::test_qscheme_pickle, test/test_quantization.py::TestQuantizedTensor::test_qtensor_sub_byte_not_aligned_cols, test/test_quantization.py::TestQuantizedTensor::test_qtensor_view, test/test_quantization.py::TestObserver::test_observer_qparams_respects_device_affinity, test/test_quantization.py::TestObserver::test_zero_numel, test/test_quantization.py::TestStaticQuantizedModule::test_embedding_bag_api, test/test_quantization.py::TestStaticQuantizedModule::test_hard_swish, test/test_quantization.py::TestStaticQuantizedModule::test_linear_relu, test/test_quantization.py::TestStaticQuantizedModule::test_pool_api, test/test_quantization.py::TestDynamicQuantizedModule::test_cell_api, test/test_quantization.py::TestDynamicQuantizedModule::test_dynamic_convtranspose1d, test/test_quantization.py::TestDynamicQuantizedModule::test_linear_api, test/test_quantization.py::TestRecordHistogramObserver::test_record_observer, test/test_quantization.py::TestFusedObsFakeQuantModule::test_default_fused_qat_config, test/test_quantization.py::TestFusedObsFakeQuantModule::test_embedding_bag_qat_config, test/test_quantization.py::TestBackendConfig::test_backend_op_config_add_dtype_config, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_fused_module, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_nested1, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_normalization, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_quantwrapper_attaches_qconfig_to_dequant, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_resnet_base, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_embedding_bag_dynamic, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_quantized_rnn, test/test_quantization.py::TestQuantizeEagerOps::test_conv_transpose_2d, test/test_quantization.py::TestQuantizeEagerQAT::test_conv_linear_symm, test/test_quantization.py::TestQuantizeEagerQAT::test_defused_embedding_bag_linear, test/test_quantization.py::TestQuantizeEagerQAT::test_forward_hooks_preserved, test/test_quantization.py::TestQuantizeEagerQAT::test_qat_embedding_bag_errors, test/test_quantization.py::TestQuantizeEagerQATNumerics::test_leaky_relu, test/test_quantization.py::TestBiasCorrectionEager::test_conv_chain, test/test_quantization.py::TestQuantizeFx::test_backend_config_check_for_weight_and_bias, test/test_quantization.py::TestQuantizeFx::test_change_backend_config_for_fixed_qparam_ops, test/test_quantization.py::TestQuantizeFx::test_conv_bn_relu, test/test_quantization.py::TestQuantizeFx::test_conv_transpose_relu_not_reference, test/test_quantization.py::TestQuantizeFx::test_default_quant_after_none_qconfig, test/test_quantization.py::TestQuantizeFx::test_linear_bn, test/test_quantization.py::TestQuantizeFx::test_no_obs_between_unmatched_node_and_copy_node, test/test_quantization.py::TestQuantizeFx::test_observer_fqn, test/test_quantization.py::TestQuantizeFx::test_qconfig_for_call_method, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_set_object_type, test/test_quantization.py::TestQuantizeFx::test_qconfig_module_name_regex, test/test_quantization.py::TestQuantizeFx::test_qnnpack_backend_config, test/test_quantization.py::TestQuantizeFx::test_qparams_fqn, test/test_quantization.py::TestQuantizeFx::test_repeat_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFx::test_unsqueeze_nontensor_args_not_observed, test/test_quantization.py::TestQuantizeFxOps::test_chunk, test/test_quantization.py::TestQuantizeFxOps::test_fixed_qparams_ops_fp16, test/test_quantization.py::TestQuantizeFxOps::test_gelu_normal, test/test_quantization.py::TestQuantizeFxOps::test_general_shape_ops, test/test_quantization.py::TestQuantizeFxOps::test_general_value_ops, test/test_quantization.py::TestQuantizeFxOps::test_pixel_shuffle, test/test_quantization.py::TestQuantizeFxOps::test_pixel_shuffle_module, test/test_quantization.py::TestQuantizeFxOps::test_prelu, test/test_quantization.py::TestQuantizeFxOps::test_qbatch_norm_relu, test/test_quantization.py::TestQuantizeFxOps::test_reshape_fp16, test/test_quantization.py::TestQuantizeFxOps::test_silu_reference, test/test_quantization.py::TestQuantizeFxOps::test_softmax_normal, test/test_quantization.py::TestQuantizeFxOps::test_sum, test/test_quantization.py::TestQuantizeFxModels::test_qat_functional_linear, test/test_quantization.py::TestNumericDebugger::test_quantize_pt2e_preserve_handle, test/test_quantization.py::TestNumericDebugger::test_run_decompositions_same_handle_id, test/test_quantization.py::TestQuantizePT2E::test_input_edge_sanity_check, test/test_quantization.py::TestQuantizePT2E::test_observer_callback, test/test_quantization.py::TestQuantizePT2E::test_shared_qspec_transitivity_case_2, test/test_quantization.py::TestQuantizePT2E::test_speed, test/test_quantization.py::TestQuantizePT2EAffineQuantization::test_channel_group_quantization, test/test_quantization.py::TestXNNPACKQuantizer::test_add_mul_scalar, test/test_quantization.py::TestXNNPACKQuantizer::test_conv1d_with_conv2d, test/test_quantization.py::TestXNNPACKQuantizer::test_conv_linear_no_permute, test/test_quantization.py::TestXNNPACKQuantizer::test_set_module_type, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_cat_recipe, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_unary_qat, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_qconfig, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_qconfig_for_dynamic_quant, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_fold_bn_erases_add_node, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_fold_bn_erases_bn_node, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_fusion, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_relu_fusion_cuda, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_transpose_bn_relu, test/test_quantization.py::TestFXGraphMatcher::test_matching_failure_node_type, test/test_quantization.py::TestFXGraphMatcher::test_nodes_with_equal_types_get_matched, test/test_quantization.py::TestFXGraphMatcher::test_user_defined_function, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_add_shadow_loggers_cuda, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_conv_fun_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_fqn, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_int8_shadows_int8_fun, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_logging_inputs, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_fun_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_mod_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_shadow_loggers_preserve_qat_numerics, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_user_module, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_conv_bn_relu_fusion_fp32, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_conv_bn_relu_fusion_quant, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_ordering, test/test_quantization.py::TestFxModelReportDetector::test_multiple_q_config_options, test/test_quantization.py::TestFxModelReportObserver::test_random_epochs_and_batches, test/test_quantization.py::TestFxModelReportClass::test_constructor, test/test_quantization.py::TestFxModelReportClass::test_prepare_model_callibration, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_convert, test/test_quantization.py::TestQuantizeJit::test_conv, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_for_general_ops, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_for_if_consistent_observation, test/test_quantization.py::TestQuantizeJitOps::test_cat_linear, test/test_quantization.py::TestQuantizeJitOps::test_hardswish, test/test_quantization.py::TestQuantizeJitOps::test_layer_norm, test/test_quantization.py::TestQuantizeJitOps::test_quantized_mul, test/test_quantization.py::TestQuantizeDynamicJitOps::test_embedding_bag_padding_idx_error, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_fake_quantize, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_quantized_dynamic_import, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_activation, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_batchnorm, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_conv, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_functional_modules, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx_prepare, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_soak_cuda_float8_e4m3fn 2025-08-14T22:44:49.0139627Z 2025-08-14T22:44:49.0139820Z Running inductor/test_aot_inductor_utils 1/1 ... [2025-08-14 22:44:49.006920] 2025-08-14T22:44:49.0140184Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:44:49.0141025Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:44:49.007248] 2025-08-14T22:44:55.5449723Z 2025-08-14T22:44:55.5451050Z inductor/test_aot_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_utils_1.1_a9453d7b7f783b3f_.log 2025-08-14T22:44:55.5451770Z Running 0 items in this shard: 2025-08-14T22:44:55.5451929Z 2025-08-14T22:44:55.5455232Z Running test_ops 2/9 ... [2025-08-14 22:44:55.544949] 2025-08-14T22:44:55.5455553Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:44:55.5456367Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '-m', 'not serial', '--shard-id=2', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:44:55.545299] 2025-08-14T22:49:45.1194279Z 2025-08-14T22:49:45.1195405Z test_quantization 8/9 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_8.9_fe2cd6a2946d4629_.log 2025-08-14T22:49:45.1256127Z Running 163 items in this shard: test/test_quantization.py::TestQuantizedOps::test_add_scalar_relu, test/test_quantization.py::TestQuantizedOps::test_avg_pool3d, test/test_quantization.py::TestQuantizedOps::test_avg_pool3d_nhwc, test/test_quantization.py::TestQuantizedOps::test_channel_shuffle, test/test_quantization.py::TestQuantizedOps::test_leaky_relu, test/test_quantization.py::TestQuantizedOps::test_mean, test/test_quantization.py::TestQuantizedOps::test_qmatmul, test/test_quantization.py::TestQuantizedOps::test_qmul_relu_same_qparams, test/test_quantization.py::TestQuantizedOps::test_qrelu, test/test_quantization.py::TestQuantizedOps::test_sigmoid, test/test_quantization.py::TestQNNPackOps::test_avg_pool2d, test/test_quantization.py::TestQNNPackOps::test_mean, test/test_quantization.py::TestQuantizedLinear::test_qlinear_add_relu_pt2e, test/test_quantization.py::TestQuantizedLinear::test_qlinear_fp8, test/test_quantization.py::TestQuantizedLinear::test_qlinear_gelu_pt2e, test/test_quantization.py::TestQuantizedLinear::test_qlinear_with_input_q_dq_qweight_dq_output_fp32, test/test_quantization.py::TestQuantizedConv::test_qconv1d_relu, test/test_quantization.py::TestQuantizedConv::test_qconv2d_add, test/test_quantization.py::TestQuantizedConv::test_qconv2d_hardswish_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_hardtanh_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_relu, test/test_quantization.py::TestDynamicQuantizedOps::test_wrapped_fbgemm_pack_gemm_matrix_fp16_pt2_compliant, test/test_quantization.py::TestQuantizedEmbeddingOps::test_embedding_bag_2d_indices, test/test_quantization.py::TestFakeQuantizeOps::test_backward_per_channel, test/test_quantization.py::TestFakeQuantizeOps::test_backward_per_tensor, test/test_quantization.py::TestFakeQuantizeOps::test_backward_per_tensor_cachemask_cpu, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quantize_per_tensor_affine_inf, test/test_quantization.py::TestFakeQuantizeOps::test_fixed_qparams_fq_module, test/test_quantization.py::TestFakeQuantizeOps::test_fq_serializable_per_tensor, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_channel_cuda, test/test_quantization.py::TestFusedObsFakeQuant::test_fused_backward_op_fake_quant_off, test/test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward, test/test_quantization.py::TestQuantizedTensor::test_decomposed_dequantize_per_channel, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_channel, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_tensor, test/test_quantization.py::TestQuantizedTensor::test_per_channel_to_device, test/test_quantization.py::TestQuantizedTensor::test_pickle_checkpoint_qtensor, test/test_quantization.py::TestQuantizedTensor::test_qtensor_equal, test/test_quantization.py::TestQuantizedTensor::test_qtensor_float_assignment, test/test_quantization.py::TestQuantizedTensor::test_qtensor_int_repr, test/test_quantization.py::TestQuantizedTensor::test_qtensor_legacy_new_failure, test/test_quantization.py::TestQuantizedTensor::test_qtensor_reshape, test/test_quantization.py::TestObserver::test_dynamic_quant_observer, test/test_quantization.py::TestObserver::test_histogram_observer_handle_close_to_infinity, test/test_quantization.py::TestStaticQuantizedModule::test_channel_shuffle, test/test_quantization.py::TestStaticQuantizedModule::test_conv1d_api, test/test_quantization.py::TestStaticQuantizedModule::test_conv2d_add_relu, test/test_quantization.py::TestStaticQuantizedModule::test_conv2d_api, test/test_quantization.py::TestStaticQuantizedModule::test_conv2d_relu_api, test/test_quantization.py::TestStaticQuantizedModule::test_dropout_serialization, test/test_quantization.py::TestStaticQuantizedModule::test_elu, test/test_quantization.py::TestStaticQuantizedModule::test_instance_norm, test/test_quantization.py::TestStaticQuantizedModule::test_layer_norm, test/test_quantization.py::TestStaticQuantizedModule::test_quant_dequant_api, test/test_quantization.py::TestDynamicQuantizedModule::test_gru_api, test/test_quantization.py::TestReferenceQuantizedModule::test_sparse, test/test_quantization.py::TestHistogramObserver::test_histogram_observer_single_inputs, test/test_quantization.py::TestBackendConfig::test_dtype_config_from_dict, test/test_quantization.py::TestQuantizeEagerOps::test_conv_2d, test/test_quantization.py::TestQuantizeEagerOps::test_conv_transpose_1d, test/test_quantization.py::TestQuantizeEagerQAT::test_conv_linear, test/test_quantization.py::TestQuantizeEagerQATNumerics::test_linear_bn_numerics, test/test_quantization.py::TestModelNumericsEager::test_weight_only_activation_only_fakequant, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_outputs_functional_static, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_outputs_lstm_dynamic, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_stub_partial, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_stub_submodule_static, test/test_quantization.py::TestFuseFx::test_fuse_conv_bn_add_relu_by_default, test/test_quantization.py::TestFuseFx::test_fuse_linear_bn_leaky_relu_onednn, test/test_quantization.py::TestFuseFx::test_linear_bn_leaky_relu_not_fused_by_default, test/test_quantization.py::TestFuseFx::test_linear_tanh_not_fused_by_default, test/test_quantization.py::TestQuantizeFx::test_attention, test/test_quantization.py::TestQuantizeFx::test_conv_transpose_not_reference, test/test_quantization.py::TestQuantizeFx::test_fp32_input_quantized_output, test/test_quantization.py::TestQuantizeFx::test_fp32_sum, test/test_quantization.py::TestQuantizeFx::test_fuse_custom_config_set_preserved_attributes, test/test_quantization.py::TestQuantizeFx::test_fused_module_qat_swap, test/test_quantization.py::TestQuantizeFx::test_pattern_match, test/test_quantization.py::TestQuantizeFx::test_pattern_match_constant, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_set_preserved_attributes, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_set_standalone_module_class, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_dict_args, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_dict_split_tuple_args, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_from_dict, test/test_quantization.py::TestQuantizeFx::test_qconfig_module_type, test/test_quantization.py::TestQuantizeFx::test_quantized_input_quantized_output, test/test_quantization.py::TestQuantizeFx::test_static_lstm, test/test_quantization.py::TestQuantizeFxOps::test_ave_pool_with_custom_cfg, test/test_quantization.py::TestQuantizeFxOps::test_bmm_int_reference, test/test_quantization.py::TestQuantizeFxOps::test_boolean_tensor, test/test_quantization.py::TestQuantizeFxOps::test_float_functional, test/test_quantization.py::TestQuantizeFxOps::test_mish_reference, test/test_quantization.py::TestQuantizeFxOps::test_mul_relu, test/test_quantization.py::TestQuantizeFxOps::test_quantized_conv_relu, test/test_quantization.py::TestQuantizeFxModels::test_prepare_serialize_switch_device_convert, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_annotations_int, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_with_oneliner_pattern, test/test_quantization.py::TestNumericDebugger::test_copy_preserve_handle, test/test_quantization.py::TestNumericDebugger::test_extract_results_from_loggers_list_output, test/test_quantization.py::TestQuantizePT2E::test_composable_quantizer_linear_conv, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_float32_float8_e4m3fn, test/test_quantization.py::TestQuantizePT2EAffineQuantization::test_dynamic_per_tok_act_per_group_weights, test/test_quantization.py::TestPT2ERepresentation::test_qdq_per_channel, test/test_quantization.py::TestXNNPACKQuantizer::test_conv2d, test/test_quantization.py::TestXNNPACKQuantizer::test_qat_dynamic_linear, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_conv2d_binary_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_dynamic_qat, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_unary_serials, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_and_module_type_case2, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_no_bias, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_transpose_bn_relu, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_bias_derived_qspec, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_fusion_cuda, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_no_bias, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_inplace_add_relu, test/test_quantization.py::TestFXGraphMatcher::test_nodes_before_cat, test/test_quantization.py::TestFXGraphMatcherModels::test_mobilenet_v2, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_linear_fun_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_linear_fun_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_mod_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_mod_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_loggers_preserve_qat_numerics, test/test_quantization.py::TestFXNumericSuiteNShadows::test_add_loggers_mobilenet_v2, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_insert_padding, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_retroactive_padding, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_compare_weights_conv, test/test_quantization.py::TestFxModelReportDetector::test_fusion_layer_in_sequential, test/test_quantization.py::TestFxModelReportDetector::test_qat_aware_model_example, test/test_quantization.py::TestFxModelReportDetector::test_sequential_model_format, test/test_quantization.py::TestFxModelReportObserver::test_observer_after_relu, test/test_quantization.py::TestFxModelReportClass::test_generate_visualizer, test/test_quantization.py::TestFxDetectInputWeightEqualization::test_input_weight_equalization_determine_points, test/test_quantization.py::TestSerialization::test_conv2d_nobias_graph_v3, test/test_quantization.py::TestSerialization::test_default_qat_qconfig, test/test_quantization.py::TestQuantizeJitPasses::test_convtranspose_trace, test/test_quantization.py::TestQuantizeJitPasses::test_fuse_linear, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_interface, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_propagate_observed_in_submodule, test/test_quantization.py::TestQuantizeJitPasses::test_quantize_fork_wait, test/test_quantization.py::TestQuantizeJitOps::test_elu, test/test_quantization.py::TestQuantizeJitOps::test_qbatch_norm, test/test_quantization.py::TestQuantizeJitOps::test_qbatch_norm_relu_BNFuncRelu, test/test_quantization.py::TestQuantizeJitOps::test_quantized_add_relu, test/test_quantization.py::TestQuantizeJitOps::test_quantized_mul_scalar_relu, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_dynamic_quant_multi_uses, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_dynamic_with_if, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_insert_quant_dequant_linear_dynamic, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_prepare_dynamic_child_qconfig, test/test_quantization.py::TestFusionPasses::test_quantized_add_relu_fusion, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_quantization_mappings, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_qat_dynamic_linear, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_quantizable_activation, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_dropout, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_import, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx_utils, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_quantize_fx, test/test_quantization.py::TestBitsCUDA::test_subclass_cuda, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_extremes_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_subnormals_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_creation_with_zeros_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_float8_e8m0fnu_rne_rounding_cuda 2025-08-14T22:49:45.1330454Z 2025-08-14T22:49:45.1330597Z Running test_ops 5/9 ... [2025-08-14 22:49:45.119984] 2025-08-14T22:49:45.1330931Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:49:45.1331352Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T22:49:45.1332383Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '-m', 'not serial', '--shard-id=5', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:49:45.120491] 2025-08-14T22:49:45.1333257Z Uploading artifacts took 0.00 seconds 2025-08-14T22:55:09.7642881Z 2025-08-14T22:55:09.7644205Z test_ops 2/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_2.9_280e564b140a95a0_.log 2025-08-14T22:55:09.8974265Z Running 3741 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_partial_views_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_permute_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unbind_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_positive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__upsample_bilinear2d_aa_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_inverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flip_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cond_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log10_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_roll_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sparse_mm_reduce_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch_ops_aten__efficient_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_as_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_errors_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diff_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dot_cuda, test/test_ops.py::TestCommonCUDA::test_errors_eq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_errors_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_median_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_errors_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout3_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__chunk_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argsort_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_combinations_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_item_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mT_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_softsign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsafe_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsafe_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_rms_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_blackman_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_out__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_integral_dtype__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nansum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_warning___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_double_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_channel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal__in_place_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_round_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_angle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cartesian_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagflat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_imag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_item_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_unpack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polar_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randn_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_squeeze_multiple_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal__in_place_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_istft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal__in_place_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal__in_place_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cauchy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exponential_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exponential_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__chunk_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scalar_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_t_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_alias_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_short_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_squeeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_t_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_where_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_T_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_roll_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sigmoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_decomposed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_block_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nansum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nonzero_static_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ones_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pinverse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_to_size_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_transpose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_uniform_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmatmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_T_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_exp2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isinf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logsumexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_split_with_sizes_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_t_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_angle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_double_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_permuted_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flipud_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_hermitian_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vecdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log10_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nansum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_linear_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_constant_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_inf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_nuc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sigmoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_t_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rsub___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__batch_norm_with_update_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__native_batch_norm_legit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_alias_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exponential_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eye_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ge_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_multiple_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_angle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cauchy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cov_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagflat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isneginf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ldexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cond_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_singular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logcumsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_with_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardswish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_roll_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_short_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hamming_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_zeta_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_take_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_complex_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_airy_ai_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_static_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_transpose_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cauchy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int16, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_long_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_arange_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_le_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_permute_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_alias_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isnan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isneginf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_inv_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softsign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_outer_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pinverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_hamming_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_entr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_squeeze_multiple_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tril_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unravel_index_cuda_int64 2025-08-14T22:55:09.9841486Z 2025-08-14T22:55:09.9841636Z Running test_nestedtensor 2/3 ... [2025-08-14 22:55:09.772785] 2025-08-14T22:55:09.9841936Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:55:09.9842690Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '-m', 'not serial', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:55:09.773382] 2025-08-14T22:58:50.5454353Z 2025-08-14T22:58:50.5455535Z test_ops 5/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_5.9_1bb3ffbc12f8c13b_.log 2025-08-14T22:58:50.6477743Z Running 3728 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing__chunk_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___radd___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_long_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_equal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogram_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_subgradients_at_zero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_qr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_pool2d_with_indices_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest-exact_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_trilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_repeat_interleave_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resize__cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_short_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch__scaled_mm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unravel_index_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zeros_like_cuda, test/test_ops.py::TestCommonCUDA::test_errors___radd___cuda, test/test_ops.py::TestCommonCUDA::test_errors__chunk_cat_cuda, test/test_ops.py::TestCommonCUDA::test_errors_amax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cat_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gt_cuda, test/test_ops.py::TestCommonCUDA::test_errors_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ne_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_errors_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___ror___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_permuted_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gather_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_interleave_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_v_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_with_sizes_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_multiple_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_put_accumulate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_negative_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_rms_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_static_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_rms_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_kaiser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_nuttall_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_out___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_householder_product_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_slogdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_nuc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___radd___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__chunk_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_char_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_permuted_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eig_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_reflect_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_replicate_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_put_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_qr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_istft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal__in_place_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_where_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_polar_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_complex_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pca_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__chunk_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hash_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_unary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_all_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unbind_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_zero__cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_T_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rdiv___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rsub___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_char_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_double_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fliplr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_float_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_xor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_square_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_triu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_baddbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_byte_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chalf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_char_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_equal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_float_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gradient_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vander_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_xor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mT_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nanmean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_short_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sigmoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_triangular_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unflatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_int_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_and_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_movedim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sgn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_equal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_geqrf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logdet_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logsumexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mT_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_logsumexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_channel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view__chunk_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_min_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_item_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_le_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_minimum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_permute_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bernoulli_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hash_tensor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_item_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kthvalue_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_or_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_local_response_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_static_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_fro_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_permute_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polar_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randint_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ravel_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_bartlett_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hann_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_nuttall_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_log_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_square_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_uniform_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsafe_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_copy_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsqueeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unsqueeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__chunk_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_permuted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nonzero_static_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_bool, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_bool, test/test_ops.py::TestTagsCUDA::test_tags_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___radd___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___ror___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ceil_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_or_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_normal__in_place_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_renorm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unbind_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unflatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_abs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_left_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_deg2rad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_inf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ormqr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resize_as__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_roll_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signbit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i1e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_xlog1py_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unsafe_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unsqueeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_where_cuda_float32 2025-08-14T22:58:50.7312591Z 2025-08-14T22:58:50.7312726Z Running test_modules 3/3 ... [2025-08-14 22:58:50.550238] 2025-08-14T22:58:50.7313010Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T22:58:50.7313913Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '-m', 'not serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 22:58:50.550611] 2025-08-14T23:04:19.0762577Z 2025-08-14T23:04:19.0763989Z test_nestedtensor 2/3 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_2.3_c459871d4643317c_.log 2025-08-14T23:04:19.1016925Z Running 527 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_cat, test/test_nestedtensor.py::TestNestedTensor::test_is_contiguous, test/test_nestedtensor.py::TestNestedTensor::test_nested_view_from_buffer_overflow_errors, test/test_nestedtensor.py::TestNestedTensor::test_numel, test/test_nestedtensor.py::TestNestedTensor::test_repr_string, test/test_nestedtensor.py::TestNestedTensor::test_size_dim, test/test_nestedtensor.py::TestNestedTensor::test_stride, test/test_nestedtensor.py::TestNestedTensor::test_to, test/test_nestedtensor.py::TestNestedTensor::test_to_padded_tensor_on_empty_tensor, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_384_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sum_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_simple_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_then_from_padded_tensor_no_transform0213_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_gelu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isneginf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_logical_not_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_neg_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_accumulate_grad_different_strides_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_as_nested_tensor_propagates_gradients_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_add_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_for_add_op_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_sub_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_gelu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_indexing_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_1024_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_512_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_mask_and_to_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_squeeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_selu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_to_buffer_series_ops_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_broadcasting_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_transposed_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_padded_dense_conversion_preserves_metadata_cache_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_in_inference_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_construction_from_list_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_copy__cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dropout_inference_mode_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_noncontig_with_holes_False_cross_attention_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flex_attention_noncontig_with_holes_False_cross_attention_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_with_pinned_memory_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_rand_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_narrow_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_njt_cat_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_autocast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_with_holes_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_split_with_sizes_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_squeeze_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_threshold_backward_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_copy_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_dtype_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_view_ragged_idx_not_one_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_byte_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_char_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_eq_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_gt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_h_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isfinite_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_airy_ai_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_non_contiguous_mutation_cuda 2025-08-14T23:04:19.1368385Z 2025-08-14T23:04:19.1368612Z Running test_decomp 6/21 ... [2025-08-14 23:04:19.077621] 2025-08-14T23:04:19.1369118Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:04:19.1370457Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=6', '--num-shards=21', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:04:19.078210] 2025-08-14T23:06:48.9926222Z 2025-08-14T23:06:48.9931745Z test_modules 3/3 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_3.3_e21064428c400faa_.log 2025-08-14T23:06:49.0632670Z Running 1240 items in this shard: test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_forward_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CrossEntropyLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTM_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoder_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LeakyReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_FractionalMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_bfloat16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LocalResponseNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_NLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RMSNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU6_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveMaxPool2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCELoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCEWithLogitsLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCEWithLogitsLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm1d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm1d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm3d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm3d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm3d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CTCLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConstantPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Conv2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Conv3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CosineEmbeddingLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CosineEmbeddingLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Embedding_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_FractionalMaxPool2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRU_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRU_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_HingeEmbeddingLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm3d_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm3d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_KLDivLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LPPool1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LPPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LPPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LSTMCell_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LSTM_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LSTM_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LayerNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Linear_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LocalResponseNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LogSoftmax_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MSELoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Mish_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiLabelMarginLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiLabelSoftMarginLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiheadAttention_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiheadAttention_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PReLU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PoissonNLLLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RMSNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNNCell_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNN_eval_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReflectionPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SELU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SiLU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Sigmoid_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SmoothL1Loss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmax2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmax_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmin_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softplus_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softplus_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softsign_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Tanhshrink_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Tanhshrink_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerDecoderLayer_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoderLayer_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoder_eval_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoder_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Transformer_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCELoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm1d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CELU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CTCLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CrossEntropyLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CrossEntropyLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ELU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Embedding_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Embedding_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRUCell_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GaussianNLLLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardswish_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardtanh_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HingeEmbeddingLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HingeEmbeddingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HuberLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm2d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_L1Loss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTMCell_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LocalResponseNorm_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LocalResponseNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LocalResponseNorm_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSoftmax_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MSELoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Mish_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Mish_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Mish_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelMarginLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelMarginLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelSoftMarginLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiLabelSoftMarginLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiheadAttention_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiheadAttention_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiheadAttention_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RMSNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RMSNorm_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Sigmoid_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanh_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerDecoderLayer_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoderLayer_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Transformer_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Transformer_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad2d_swap_True_set_grad_True_cuda_float32 2025-08-14T23:06:49.1022664Z 2025-08-14T23:06:49.1022804Z Running test_decomp 10/21 ... [2025-08-14 23:06:48.994634] 2025-08-14T23:06:49.1023091Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:06:49.1023824Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=10', '--num-shards=21', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:06:48.995015] 2025-08-14T23:13:49.8625441Z 2025-08-14T23:13:49.8630450Z test_decomp 6/21 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_6.21_c7365808ab7baddb_.log 2025-08-14T23:13:49.8728934Z Running 441 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_indices_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_addr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_polar_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_float32, test/test_decomp.py::DecompOneOffTestsCUDA::test_amp_batch_norm_backward_cuda 2025-08-14T23:13:49.8829376Z 2025-08-14T23:13:49.8829501Z Running test_decomp 18/21 ... [2025-08-14 23:13:49.863235] 2025-08-14T23:13:49.8829782Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:13:49.8830113Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T23:13:49.8830480Z Uploading artifacts took 0.00 seconds 2025-08-14T23:13:49.8831214Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '-m', 'not serial', '--shard-id=18', '--num-shards=21', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:13:49.863658] 2025-08-14T23:14:27.5657503Z 2025-08-14T23:14:27.5665674Z test_decomp 10/21 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_10.21_16c5aeea3bc2758c_.log 2025-08-14T23:14:27.5881993Z Running 427 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_right_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float8_e5m2, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_batch_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_huber_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_hardswish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool2d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_triu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_logsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_complex128, test/test_decomp.py::DecompOneOffTestsCUDA::test_elu_backward_cuda 2025-08-14T23:14:27.6108504Z 2025-08-14T23:14:27.6108876Z Running test_expanded_weights 1/1 ... [2025-08-14 23:14:27.566681] 2025-08-14T23:14:27.6109590Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:14:27.6111729Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_expanded_weights.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:14:27.567276] 2025-08-14T23:15:11.9658608Z 2025-08-14T23:15:11.9660453Z test_expanded_weights 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_expanded_weights_1.1_2d7a6eac33b09828_.log 2025-08-14T23:15:11.9837789Z Running 220 items in this shard: test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_forward_helper_cuda, test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_forward_helper_failure_args_cuda, test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_set_grad_sample_if_exists_cuda, test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_set_grad_sample_if_exists_failure_cuda, test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_sum_over_all_but_batch_and_last_n_cuda, test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_unpack_expanded_weight_or_tensor_cuda, test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_unpack_expanded_weight_or_tensor_failure_cuda, test/test_expanded_weights.py::TestExpandedWeightHelperFunctionCUDA::test_unpack_expanded_weight_or_tensor_with_custom_function_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_cnn_model_mean_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_cnn_model_sum_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_embedding_model_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_error_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv1d_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv1d_cuda_complex128, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv1d_cuda_complex32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv1d_cuda_complex64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv1d_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv1d_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv1d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv2d_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv2d_cuda_complex128, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv2d_cuda_complex32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv2d_cuda_complex64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv2d_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv2d_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv2d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv3d_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv3d_cuda_complex128, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv3d_cuda_complex32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv3d_cuda_complex64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv3d_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv3d_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_conv3d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_embedding_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_embedding_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_embedding_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_embedding_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_group_norm_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_group_norm_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_group_norm_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_group_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_instance_norm_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_instance_norm_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_instance_norm_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_instance_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_layer_norm_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_layer_norm_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_layer_norm_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_layer_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_linear_cuda_bfloat16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_linear_cuda_complex128, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_linear_cuda_complex64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_linear_cuda_float16, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_linear_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_forward_nn_functional_linear_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_conv1d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_conv2d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_conv3d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_embedding_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_group_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_instance_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_layer_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_mean_nn_functional_linear_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_conv1d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_conv2d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_conv3d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_embedding_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_group_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_instance_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_layer_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weight_per_sample_grad_sum_nn_functional_linear_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_conv1d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_conv2d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_conv3d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_embedding_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_group_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_instance_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_layer_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_linear_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_group_norm_error_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_group_norm_model_num_dim_1_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_group_norm_model_num_dim_2_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_group_norm_model_num_dim_3_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_instance_norm_model_num_dim_1_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_instance_norm_model_num_dim_2_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_instance_norm_model_num_dim_3_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_layer_norm_model_num_dim_1_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_layer_norm_model_num_dim_2_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_layer_norm_model_num_dim_3_cuda, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_conv1d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_conv2d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_conv3d_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_embedding_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_group_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_instance_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_layer_norm_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightFunctionalCUDA::test_unsupported_expand_weights_nn_functional_linear_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_circular_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_circular_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_circular_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad1_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad1_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad1_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad1size1_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad1size1_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad1size1_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad2size1_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad2size1_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_pad2size1_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_reflect_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_reflect_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_reflect_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_replicate_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_replicate_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_replicate_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_stride_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_stride_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_stride_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_zero_batch_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_zero_batch_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_zero_batch_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_zeros_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_zeros_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv1d_zeros_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_circular_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_circular_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_circular_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_dilated_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_dilated_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_dilated_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_no_bias_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_no_bias_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_no_bias_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_padding_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_padding_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_padding_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_reflect_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_reflect_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_reflect_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_replicate_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_replicate_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_replicate_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_strided_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_strided_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_strided_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_zero_batch_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_zero_batch_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_zero_batch_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_zeros_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_zeros_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv2d_zeros_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_1x1x1_no_bias_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_1x1x1_no_bias_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_1x1x1_no_bias_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_circular_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_circular_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_circular_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_no_bias_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_no_bias_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_no_bias_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_replicate_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_replicate_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_replicate_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_stride_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_stride_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_stride_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_stride_padding_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_stride_padding_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_stride_padding_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_zero_batch_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_zero_batch_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_zero_batch_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_zeros_stride2_pad2_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_zeros_stride2_pad2_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Conv3d_zeros_stride2_pad2_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Embedding_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Embedding_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Embedding_discontiguous_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Embedding_discontiguous_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Embedding_discontiguous_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Embedding_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_LayerNorm_3d_no_affine_large_feature_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_LayerNorm_3d_no_affine_large_feature_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_LayerNorm_3d_no_affine_large_feature_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_no_batch_dim_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_no_batch_dim_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_no_batch_dim_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_no_bias_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_no_bias_cuda_double_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_Linear_no_bias_multiple_inputs_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_GRU_eval_mode_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_GRU_eval_mode_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_GRU_train_mode_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_GRU_train_mode_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_LSTM_eval_mode_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_LSTM_eval_mode_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_LSTM_train_mode_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_LSTM_train_mode_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_RNN_eval_mode_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_RNN_eval_mode_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_RNN_train_mode_cuda_float32, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_RNN_train_mode_cuda_float64, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_per_sample_api_compute_batch_size_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_per_sample_api_compute_batch_size_not_pytreeable_cuda, test/test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_per_sample_api_failing_cuda 2025-08-14T23:15:12.0011773Z 2025-08-14T23:15:12.0012112Z Running test_ops_gradients 1/4 ... [2025-08-14 23:15:11.966232] 2025-08-14T23:15:12.0012802Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:15:12.0014596Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '-m', 'not serial', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:15:11.966890] 2025-08-14T23:23:11.1116462Z 2025-08-14T23:23:11.1120861Z test_ops_gradients 1/4 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_1.4_04856fe4cebead3a_.log 2025-08-14T23:23:11.1676690Z Running 1358 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyCatCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_T_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rmatmul___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rpow___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__native_batch_norm_legit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_angle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_partial_views_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_asinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bool_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cauchy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cdouble_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cfloat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clamp_min_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_corrcoef_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cov_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_div_trunc_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_double_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_einsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_erfinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ihfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_flatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_gradient_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_grid_sampler_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_gt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_half_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isinf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kron_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cholesky_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eigh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_inv_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_rank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vander_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vecdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_long_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mT_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_min_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_min_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_multinomial_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_avg_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_dropout3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_gelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_l1_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_mse_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_replicate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softsign_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_unfold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nonzero_static_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_fro_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_nuc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_in_place_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_outer_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_quantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rsqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_general_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_hann_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_slice_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_slice_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_erfcx_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_ndtri_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_xlog1py_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_list_args_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_square_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_squeeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_take_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tile_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyCatCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyNonzeroCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___radd___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rpow___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rpow___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__chunk_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__native_batch_norm_legit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__unsafe_masked_index_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmm_decomposed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_alias_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_angle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atan2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_auto_functionalize_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ceil_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cov_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_embed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagflat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_einsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_erfinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ihfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flipud_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_frexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_gather_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_gt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_igamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_imag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_inner_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isinf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isnan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_binary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_kron_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ldexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cholesky_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_det_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_det_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eigh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eigvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eigvalsh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_rank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_qr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_tensorsolve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vander_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logaddexp2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_or_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mT_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_map_triple_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_max_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_min_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nansum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_cosine_similarity_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_local_response_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_tanhshrink_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_tanhshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nonzero_static_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_nuc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_normal_in_place_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ormqr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_outer_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pinverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_quantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randint_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ravel_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_interleave_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reshape_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resize__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_searchsorted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sgn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sigmoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_general_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_slice_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_erfcx_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_modified_bessel_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_ndtri_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_xlog1py_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_zeta_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_list_args_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_square_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_multiple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_svd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_take_along_dim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tensor_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_true_divide_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unflatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unfold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unique_consecutive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsafe_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_vdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_as_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_vsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_where_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyCubeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyNonzeroCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyTakeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___getitem___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__unsafe_masked_index_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmm_decomposed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_alias_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_argwhere_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_as_strided_partial_views_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cdouble_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cholesky_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cholesky_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cholesky_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cond_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_contiguous_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cov_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cummax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cummin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_div_floor_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_div_trunc_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_equal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eye_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_geometric_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_gradient_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_grid_sampler_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isinf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_kron_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cond_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_det_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_det_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_inv_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_multi_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_svd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_tensorsolve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log10_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logcumsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_or_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_long_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mT_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_max_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_min_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_dropout2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_gelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_glu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_kl_div_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_l1_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_circular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_prelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_relu6_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_nuc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_nuc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_outer_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_outer_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pinverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polar_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_quantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_interleave_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sgn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_general_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_slice_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_zeta_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_multiple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_take_along_dim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_true_divide_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_uniform_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_vdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_vsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_vstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_where_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyTakeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__native_batch_norm_legit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmm_decomposed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmm_decomposed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_angle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_as_strided_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_as_strided_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_broadcast_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cauchy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cdouble_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cdouble_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_clamp_min_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cov_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cummax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diag_embed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagflat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagonal_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_no_rounding_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_eq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_equal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_eye_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ihfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gradient_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_half_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_histc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_inner_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_inner_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isnan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_det_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_householder_product_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lstsq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_matrix_rank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_pinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_pinv_singular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log10_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log1p_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_or_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_or_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_long_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_matmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_max_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_min_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_multinomial_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nansum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_avg_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cosine_similarity_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_dropout3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_glu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hardshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_linear_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_circular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_constant_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_tanhshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nonzero_static_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_norm_fro_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_norm_nuc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_norm_nuc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_repeat_interleave_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_round_decimals_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sgn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_short_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sigmoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_general_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_erfcx_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_legendre_polynomial_p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_with_sizes_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_square_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_squeeze_multiple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_take_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_triangular_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_uniform_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsafe_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_T_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___radd___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__chunk_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__unsafe_masked_index_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_decomposed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_alias_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_arange_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argwhere_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atan2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bfloat16_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cdouble_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_char_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_column_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cov_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cummax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_div_no_rounding_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_div_trunc_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_dsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_erfinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_expand_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_eye_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ihfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_frexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_geometric_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gradient_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hash_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_histc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_igammac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_imag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isnan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cholesky_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cond_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigvalsh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_householder_product_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_rank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_qr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_vecdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_vector_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log10_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logaddexp2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_long_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mT_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_max_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_meshgrid_list_of_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_narrow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_cosine_similarity_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_dropout2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_group_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softsign_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nonzero_static_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_nuc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_outer_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pinverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_polar_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_qr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_repeat_interleave_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_roll_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_y0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_modified_bessel_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_zeta_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_square_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_std_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_std_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_take_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_take_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_triangular_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tril_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unfold_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unfold_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_where_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_xlogy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zeros_like_cuda_complex128 2025-08-14T23:23:11.2061857Z 2025-08-14T23:23:11.2062000Z Running functorch/test_ops 4/9 ... [2025-08-14 23:23:11.112904] 2025-08-14T23:23:11.2062288Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:23:11.2063031Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'not serial', '--shard-id=4', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:23:11.113246] 2025-08-14T23:24:08.3903503Z 2025-08-14T23:24:08.3904679Z test_decomp 18/21 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_18.21_653bc2199dc1855c_.log 2025-08-14T23:24:08.4134617Z Running 440 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rand___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hypot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_addcdiv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_select_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_eval_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_eval_mode_cuda_float64, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float64 2025-08-14T23:24:08.4357998Z 2025-08-14T23:24:08.4358362Z Running functorch/test_ops 8/9 ... [2025-08-14 23:24:08.390946] 2025-08-14T23:24:08.4359089Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:24:08.4360998Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '-m', 'not serial', '--shard-id=8', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:24:08.391393] 2025-08-14T23:30:55.2918279Z 2025-08-14T23:30:55.2923885Z functorch/test_ops 4/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_4.9_68b41ee704b1fb19_.log 2025-08-14T23:30:55.3609680Z Running 1133 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_argmin_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_ceil_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_floor_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmax_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ceil_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_hsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_multiple_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unfold_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_decomposed_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_all_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cartesian_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_subgradients_at_zero_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_head_attention_forward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pairwise_distance_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_with_dtype_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softplus_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polar_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_positive_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize_as__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_nuttall_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_airy_ai_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i0e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_legendre_polynomial_p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_SortGenVmapAutogradFunction_cuda_float32 2025-08-14T23:30:55.4131432Z 2025-08-14T23:30:55.4131585Z Running nn/test_packed_sequence 1/1 ... [2025-08-14 23:30:55.295413] 2025-08-14T23:30:55.4131889Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:30:55.4132638Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_packed_sequence.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:30:55.296088] 2025-08-14T23:30:59.5709752Z 2025-08-14T23:30:59.5710868Z nn/test_packed_sequence 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_packed_sequence_1.1_3b033b68eb6f0c9a_.log 2025-08-14T23:30:59.5714644Z Running 12 items in this shard: test/nn/test_packed_sequence.py::PackedSequenceTest::test_pack_padded_sequence, test/nn/test_packed_sequence.py::PackedSequenceTest::test_pack_sequence, test/nn/test_packed_sequence.py::PackedSequenceTest::test_pad_sequence, test/nn/test_packed_sequence.py::PackedSequenceTest::test_pad_sequence_with_non_iterable_sequences, test/nn/test_packed_sequence.py::PackedSequenceTest::test_pad_sequence_with_tensor_sequences, test/nn/test_packed_sequence.py::PackedSequenceTest::test_to, test/nn/test_packed_sequence.py::PackedSequenceTest::test_to_memory_format, test/nn/test_packed_sequence.py::PackedSequenceTest::test_total_length, test/nn/test_packed_sequence.py::PackedSequenceTest::test_type_casts, test/nn/test_packed_sequence.py::PackedSequenceTest::test_unpack_sequence, test/nn/test_packed_sequence.py::PackedSequenceTest::test_unpad_sequence, test/nn/test_packed_sequence.py::PackedSequenceTest::test_wrong_order 2025-08-14T23:30:59.5720802Z 2025-08-14T23:30:59.5721142Z Running nn/test_pruning 1/1 ... [2025-08-14 23:30:59.570708] 2025-08-14T23:30:59.5721845Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:30:59.5724297Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pruning.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:30:59.571276] 2025-08-14T23:31:03.2450689Z 2025-08-14T23:31:03.2451537Z nn/test_pruning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pruning_1.1_60e65dc7150f8f8b_.log 2025-08-14T23:31:03.2460100Z Running 34 items in this shard: test/nn/test_pruning.py::TestPruningNN::test_compute_nparams_to_prune, test/nn/test_pruning.py::TestPruningNN::test_custom_from_mask_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_identity_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning_with_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_multiple_pruning_calls, test/nn/test_pruning.py::TestPruningNN::test_prune, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores_mimic_default, test/nn/test_pruning.py::TestPruningNN::test_pruning_container, test/nn/test_pruning.py::TestPruningNN::test_pruning_container_compute_mask, test/nn/test_pruning.py::TestPruningNN::test_pruning_id_consistency, test/nn/test_pruning.py::TestPruningNN::test_pruning_rollback, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_model, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_state_dict, test/nn/test_pruning.py::TestPruningNN::test_random_pruning, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_0perc, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_new_weight, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_orig, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_pickle, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_sizes, test/nn/test_pruning.py::TestPruningNN::test_random_structured_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_exception, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_rnn_pruning, test/nn/test_pruning.py::TestPruningNN::test_unstructured_pruning_same_magnitude, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount_init 2025-08-14T23:31:03.2468812Z 2025-08-14T23:31:03.2469001Z Running optim/test_optim 1/1 ... [2025-08-14 23:31:03.245055] 2025-08-14T23:31:03.2469367Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:03.2470294Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_optim.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:03.245649] 2025-08-14T23:31:06.3650998Z 2025-08-14T23:31:06.3651623Z optim/test_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_optim_1.1_b5b03285d8093831_.log 2025-08-14T23:31:06.3652139Z 2025-08-14T23:31:06.3652329Z Running profiler/test_execution_trace 1/1 ... [2025-08-14 23:31:06.364989] 2025-08-14T23:31:06.3652701Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:06.3657960Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_execution_trace.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:06.365286] 2025-08-14T23:31:13.5950246Z 2025-08-14T23:31:13.5952938Z profiler/test_execution_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_execution_trace_1.1_36bf118ccf45fb1b_.log 2025-08-14T23:31:13.5964736Z Running 13 items in this shard: test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_alone_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_env_disabled_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_env_enabled_with_kineto_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_env_enabled_with_pt2_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_nested_tensor_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_no_capture_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_record_integral_tensor_data_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_record_integral_tensor_range_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_repeat_in_loop_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_start_stop_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_with_kineto_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_with_pt2_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_triton_fx_graph_with_et_cuda 2025-08-14T23:31:13.5975083Z 2025-08-14T23:31:13.5975488Z Running profiler/test_profiler_tree 1/1 ... [2025-08-14 23:31:13.594757] 2025-08-14T23:31:13.5976250Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:13.5978120Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler_tree.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:13.595350] 2025-08-14T23:31:17.3189029Z 2025-08-14T23:31:17.3190653Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_4aa6d17855324c49_.log 2025-08-14T23:31:17.3198049Z Running 10 items in this shard: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_detailed, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_with_stream, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory_and_stack, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_record_function, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_dispatch, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_function 2025-08-14T23:31:17.3206957Z 2025-08-14T23:31:17.3207348Z Running profiler/test_python_tracer 1/1 ... [2025-08-14 23:31:17.318802] 2025-08-14T23:31:17.3208120Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:17.3209985Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_python_tracer.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:17.319479] 2025-08-14T23:31:24.2000120Z 2025-08-14T23:31:24.2002668Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_625cb542d722330b_.log 2025-08-14T23:31:24.2005768Z Running 3 items in this shard: test/profiler/test_python_tracer.py::TestPythonTracer::test_method_with_c_function, test/profiler/test_python_tracer.py::TestPythonTracer::test_monitoring_callback, test/profiler/test_python_tracer.py::TestPythonTracer::test_unexpected_c_return_events 2025-08-14T23:31:24.2007826Z 2025-08-14T23:31:24.2008254Z Running profiler/test_record_function 1/1 ... [2025-08-14 23:31:24.199951] 2025-08-14T23:31:24.2009064Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:24.2010992Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_record_function.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:24.200571] 2025-08-14T23:31:27.8244031Z 2025-08-14T23:31:27.8245249Z profiler/test_record_function 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_record_function_1.1_96f0be9d58efe384_.log 2025-08-14T23:31:27.8249119Z Running 4 items in this shard: test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_delegation_with_profiler, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function_fork, test/profiler/test_record_function.py::TestRecordFunction::test_record_function 2025-08-14T23:31:27.8251214Z 2025-08-14T23:31:27.8251401Z Running profiler/test_torch_tidy 1/1 ... [2025-08-14 23:31:27.824164] 2025-08-14T23:31:27.8251728Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:27.8252515Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_torch_tidy.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:27.824767] 2025-08-14T23:31:35.0061057Z 2025-08-14T23:31:35.0061881Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_e598b6a54005304b_.log 2025-08-14T23:31:35.0068004Z Running 22 items in this shard: test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_id_uniqueness, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids_with_other_ops, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocations, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_extra_fields, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_impl_reuse, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_mkldnn_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_module_and_optimizer_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_nnmodule_params, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_adam, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_sgd, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_pointers_and_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_refcounts, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_scalar_ins, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_sparse_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_lists, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_properties, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_full, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_keep_alive, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_scalar_args, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_set 2025-08-14T23:31:35.0073388Z 2025-08-14T23:31:35.0073541Z Running test_accelerator 1/1 ... [2025-08-14 23:31:35.005926] 2025-08-14T23:31:35.0073835Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:35.0074586Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_accelerator.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:35.006517] 2025-08-14T23:31:39.3815153Z 2025-08-14T23:31:39.3816371Z test_accelerator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_accelerator_1.1_801b02c7a26ed37e_.log 2025-08-14T23:31:39.3826288Z Running 11 items in this shard: test/test_accelerator.py::TestAccelerator::test_current_accelerator, test/test_accelerator.py::TestAccelerator::test_current_stream_query, test/test_accelerator.py::TestAccelerator::test_device_context_manager, test/test_accelerator.py::TestAccelerator::test_generic_event_behavior, test/test_accelerator.py::TestAccelerator::test_generic_multi_device_behavior, test/test_accelerator.py::TestAccelerator::test_generic_stream_behavior, test/test_accelerator.py::TestAccelerator::test_memory_stats, test/test_accelerator.py::TestAccelerator::test_multi_device_context_manager, test/test_accelerator.py::TestAccelerator::test_multi_device_stream_context_manager, test/test_accelerator.py::TestAccelerator::test_pin_memory_on_non_blocking_copy, test/test_accelerator.py::TestAccelerator::test_stream_context_manager 2025-08-14T23:31:39.3831503Z 2025-08-14T23:31:39.3831959Z Running test_ao_sparsity 1/1 ... [2025-08-14 23:31:39.381530] 2025-08-14T23:31:39.3832612Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:31:39.3834371Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ao_sparsity.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:31:39.382136] 2025-08-14T23:32:19.7118549Z 2025-08-14T23:32:19.7119334Z test_ao_sparsity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ao_sparsity_1.1_c0c1152c4631ed4c_.log 2025-08-14T23:32:19.7143908Z Running 88 items in this shard: test/test_ao_sparsity.py::TestQuantizedSparseKernels::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear_serdes, test/test_ao_sparsity.py::TestFakeSparsity::test_jit_trace, test/test_ao_sparsity.py::TestFakeSparsity::test_masking_logic, test/test_ao_sparsity.py::TestFakeSparsity::test_state_dict_preserved, test/test_ao_sparsity.py::TestFakeSparsity::test_weights_parametrized, test/test_ao_sparsity.py::TestCubicScheduler::test_constructor, test/test_ao_sparsity.py::TestCubicScheduler::test_step, test/test_ao_sparsity.py::TestScheduler::test_constructor, test/test_ao_sparsity.py::TestScheduler::test_lambda_scheduler, test/test_ao_sparsity.py::TestScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestScheduler::test_step, test/test_ao_sparsity.py::TestBaseSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseSparsifier::test_convert, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params1, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params2, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params3, test/test_ao_sparsity.py::TestBaseSparsifier::test_prepare_config, test/test_ao_sparsity.py::TestBaseSparsifier::test_state_dict, test/test_ao_sparsity.py::TestBaseSparsifier::test_step, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_constructor, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_prepare, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_constructor, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_prepare, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step_2_of_4, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_complex_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_activation_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_bias_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_padding_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_pool_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_activation_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_bias_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_linear, test/test_ao_sparsity.py::TestFPGMPruner::test_compute_distance, test/test_ao_sparsity.py::TestFPGMPruner::test_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_lstm_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestComposability::test_convert_without_squash_mask, test/test_ao_sparsity.py::TestComposability::test_fusion_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_q_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_qat_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_fusion, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_q_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_qat_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_before_s_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_s_prep_ref_conv, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_q_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_qat_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_q_prep_fx_ref, test/test_ao_sparsity.py::TestActivationSparsifier::test_activation_sparsifier, test/test_ao_sparsity.py::TestBaseDataScheduler::test_constructor, test/test_ao_sparsity.py::TestBaseDataScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestBaseDataScheduler::test_state_dict, test/test_ao_sparsity.py::TestBaseDataScheduler::test_step, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_embeddings, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_parameters, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_tensors, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_embeddings, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_parameters, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_tensors, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_quantize_first, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_sparsify_first, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_for_tensors, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_root 2025-08-14T23:32:19.7166055Z 2025-08-14T23:32:19.7166288Z Running test_appending_byte_serializer 1/1 ... [2025-08-14 23:32:19.712128] 2025-08-14T23:32:19.7166696Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:32:19.7167815Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_appending_byte_serializer.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:32:19.712720] 2025-08-14T23:32:22.7038069Z 2025-08-14T23:32:22.7039827Z functorch/test_ops 8/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_8.9_8e6a568caa606e9e_.log 2025-08-14T23:32:22.7771759Z Running 1150 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_l1_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_mse_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmax_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_diagonal_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_hsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_reshape_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_neg_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_select_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_transpose_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyMulAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__chunk_cat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__native_batch_norm_legit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_aminmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_arange_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdouble_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_equal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flipud_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geometric_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isfinite_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svd_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorsolve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vector_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log1p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_sum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_maximum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_list_of_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu6_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_selu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_fro_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_in_place_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ormqr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_interleave_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hamming_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_sampled_addmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_entr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensordot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trunc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unflatten_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_xlogy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvmapjvp_linalg_solve_cuda 2025-08-14T23:32:22.8329927Z 2025-08-14T23:32:22.8330182Z Running test_autoload 1/1 ... [2025-08-14 23:32:22.706023] 2025-08-14T23:32:22.8330721Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:32:22.8332090Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autoload.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:32:22.706344] 2025-08-14T23:32:23.2363746Z 2025-08-14T23:32:23.2365305Z test_appending_byte_serializer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_appending_byte_serializer_1.1_479c37be61a14ac3_.log 2025-08-14T23:32:23.2368060Z Running 3 items in this shard: test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_checksum, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_class, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_int 2025-08-14T23:32:23.2369856Z 2025-08-14T23:32:23.2370163Z Running test_binary_ufuncs 1/1 ... [2025-08-14 23:32:23.236535] 2025-08-14T23:32:23.2370738Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:32:23.2372248Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_binary_ufuncs.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:32:23.236838] 2025-08-14T23:32:26.2277585Z 2025-08-14T23:32:26.2278190Z test_autoload 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autoload_1.1_c5d0ecc67f09a44a_.log 2025-08-14T23:32:26.2279164Z Running 1 items in this shard: test/test_autoload.py::TestDeviceBackendAutoload::test_autoload 2025-08-14T23:32:26.2279622Z 2025-08-14T23:32:26.2279869Z Running test_functional_optim 1/1 ... [2025-08-14 23:32:26.227820] 2025-08-14T23:32:26.2280382Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:32:26.2285127Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functional_optim.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:32:26.228262] 2025-08-14T23:32:29.9506466Z 2025-08-14T23:32:29.9508234Z test_functional_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_optim_1.1_d3e4816447749315_.log 2025-08-14T23:32:29.9512702Z Running 4 items in this shard: test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam_w, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_sgd, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_registration 2025-08-14T23:32:29.9515826Z 2025-08-14T23:32:29.9518693Z Running test_functionalization 1/1 ... [2025-08-14 23:32:29.950241] 2025-08-14T23:32:29.9519525Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:32:29.9521632Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functionalization.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:32:29.950768] 2025-08-14T23:32:41.4846280Z 2025-08-14T23:32:41.4847740Z test_functionalization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functionalization_1.1_cd2546fa557f8004_.log 2025-08-14T23:32:41.4881676Z Running 112 items in this shard: test/test_functionalization.py::TestFunctionalization::test_advanced_indexing, test/test_functionalization.py::TestFunctionalization::test_advanced_indexing_correct_strides, test/test_functionalization.py::TestFunctionalization::test_aliases_maintained_after_pass_when_reapplying_views, test/test_functionalization.py::TestFunctionalization::test_as_strided, test/test_functionalization.py::TestFunctionalization::test_batch_norm, test/test_functionalization.py::TestFunctionalization::test_cat, test/test_functionalization.py::TestFunctionalization::test_channels_last_contiguous, test/test_functionalization.py::TestFunctionalization::test_copy_, test/test_functionalization.py::TestFunctionalization::test_copy_stride_mismatch, test/test_functionalization.py::TestFunctionalization::test_diagonal, test/test_functionalization.py::TestFunctionalization::test_diagonal_mutated_input, test/test_functionalization.py::TestFunctionalization::test_everything, test/test_functionalization.py::TestFunctionalization::test_expand_symint, test/test_functionalization.py::TestFunctionalization::test_fill_, test/test_functionalization.py::TestFunctionalization::test_freeze, test/test_functionalization.py::TestFunctionalization::test_index_mutation_on_non_input, test/test_functionalization.py::TestFunctionalization::test_inplace_on_non_view, test/test_functionalization.py::TestFunctionalization::test_instance_norm, test/test_functionalization.py::TestFunctionalization::test_metadata_change, test/test_functionalization.py::TestFunctionalization::test_metadata_change_out_op, test/test_functionalization.py::TestFunctionalization::test_mixed_wrappers_invalid, test/test_functionalization.py::TestFunctionalization::test_mixed_wrappers_valid, test/test_functionalization.py::TestFunctionalization::test_multi_out, test/test_functionalization.py::TestFunctionalization::test_multiple_views_of_same_base, test/test_functionalization.py::TestFunctionalization::test_mutable_op_not_inplace_or_other, test/test_functionalization.py::TestFunctionalization::test_mutation_overlapping_mem, test/test_functionalization.py::TestFunctionalization::test_nested_functions_propagate_updates, test/test_functionalization.py::TestFunctionalization::test_only_one_view, test/test_functionalization.py::TestFunctionalization::test_optional_tensor_list, test/test_functionalization.py::TestFunctionalization::test_python_functionalization, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_conj, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_is_conj, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_is_neg, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_lift_fresh, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_lift_fresh_storage, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_neg, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_zero_tensor, test/test_functionalization.py::TestFunctionalization::test_reapply_views_simple, test/test_functionalization.py::TestFunctionalization::test_resize_larger_invalid, test/test_functionalization.py::TestFunctionalization::test_resize_larger_valid, test/test_functionalization.py::TestFunctionalization::test_resize_same_size_diff_rank, test/test_functionalization.py::TestFunctionalization::test_resize_smaller, test/test_functionalization.py::TestFunctionalization::test_save_for_backwards_segfault, test/test_functionalization.py::TestFunctionalization::test_scalars, test/test_functionalization.py::TestFunctionalization::test_set_, test/test_functionalization.py::TestFunctionalization::test_simple, test/test_functionalization.py::TestFunctionalization::test_simple_out, test/test_functionalization.py::TestFunctionalization::test_slice, test/test_functionalization.py::TestFunctionalization::test_split, test/test_functionalization.py::TestFunctionalization::test_split_with_sizes, test/test_functionalization.py::TestFunctionalization::test_tensor_ctr, test/test_functionalization.py::TestFunctionalization::test_tensor_list_composite, test/test_functionalization.py::TestFunctionalization::test_tensor_list_mixed_functional_nonfunctional, test/test_functionalization.py::TestFunctionalization::test_unbind, test/test_functionalization.py::TestFunctionalization::test_view_clone_view_inplace, test/test_functionalization.py::TestFunctionalization::test_view_inplace, test/test_functionalization.py::TestCrossRefFunctionalization::test_advanced_indexing, test/test_functionalization.py::TestCrossRefFunctionalization::test_advanced_indexing_correct_strides, test/test_functionalization.py::TestCrossRefFunctionalization::test_aliases_maintained_after_pass_when_reapplying_views, test/test_functionalization.py::TestCrossRefFunctionalization::test_as_strided, test/test_functionalization.py::TestCrossRefFunctionalization::test_batch_norm, test/test_functionalization.py::TestCrossRefFunctionalization::test_cat, test/test_functionalization.py::TestCrossRefFunctionalization::test_channels_last_contiguous, test/test_functionalization.py::TestCrossRefFunctionalization::test_copy_, test/test_functionalization.py::TestCrossRefFunctionalization::test_copy_stride_mismatch, test/test_functionalization.py::TestCrossRefFunctionalization::test_diagonal, test/test_functionalization.py::TestCrossRefFunctionalization::test_diagonal_mutated_input, test/test_functionalization.py::TestCrossRefFunctionalization::test_everything, test/test_functionalization.py::TestCrossRefFunctionalization::test_expand_symint, test/test_functionalization.py::TestCrossRefFunctionalization::test_fill_, test/test_functionalization.py::TestCrossRefFunctionalization::test_freeze, test/test_functionalization.py::TestCrossRefFunctionalization::test_index_mutation_on_non_input, test/test_functionalization.py::TestCrossRefFunctionalization::test_inplace_on_non_view, test/test_functionalization.py::TestCrossRefFunctionalization::test_instance_norm, test/test_functionalization.py::TestCrossRefFunctionalization::test_metadata_change, test/test_functionalization.py::TestCrossRefFunctionalization::test_metadata_change_out_op, test/test_functionalization.py::TestCrossRefFunctionalization::test_mixed_wrappers_invalid, test/test_functionalization.py::TestCrossRefFunctionalization::test_mixed_wrappers_valid, test/test_functionalization.py::TestCrossRefFunctionalization::test_multi_out, test/test_functionalization.py::TestCrossRefFunctionalization::test_multiple_views_of_same_base, test/test_functionalization.py::TestCrossRefFunctionalization::test_mutable_op_not_inplace_or_other, test/test_functionalization.py::TestCrossRefFunctionalization::test_mutation_overlapping_mem, test/test_functionalization.py::TestCrossRefFunctionalization::test_nested_functions_propagate_updates, test/test_functionalization.py::TestCrossRefFunctionalization::test_only_one_view, test/test_functionalization.py::TestCrossRefFunctionalization::test_optional_tensor_list, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_conj, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_is_conj, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_is_neg, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_lift_fresh, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_lift_fresh_storage, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_neg, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_zero_tensor, test/test_functionalization.py::TestCrossRefFunctionalization::test_reapply_views_simple, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_larger_invalid, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_larger_valid, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_same_size_diff_rank, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_smaller, test/test_functionalization.py::TestCrossRefFunctionalization::test_save_for_backwards_segfault, test/test_functionalization.py::TestCrossRefFunctionalization::test_scalars, test/test_functionalization.py::TestCrossRefFunctionalization::test_set_, test/test_functionalization.py::TestCrossRefFunctionalization::test_simple, test/test_functionalization.py::TestCrossRefFunctionalization::test_simple_out, test/test_functionalization.py::TestCrossRefFunctionalization::test_slice, test/test_functionalization.py::TestCrossRefFunctionalization::test_split, test/test_functionalization.py::TestCrossRefFunctionalization::test_split_with_sizes, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_ctr, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_list_composite, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_list_mixed_functional_nonfunctional, test/test_functionalization.py::TestCrossRefFunctionalization::test_unbind, test/test_functionalization.py::TestCrossRefFunctionalization::test_view_clone_view_inplace, test/test_functionalization.py::TestCrossRefFunctionalization::test_view_inplace 2025-08-14T23:32:41.4940823Z 2025-08-14T23:32:41.4941137Z Running test_futures 1/1 ... [2025-08-14 23:32:41.484669] 2025-08-14T23:32:41.4941799Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:32:41.4943582Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_futures.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:32:41.485204] 2025-08-14T23:32:45.7599102Z 2025-08-14T23:32:45.7600776Z test_futures 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_futures_1.1_d7e848482444a4d5_.log 2025-08-14T23:32:45.7609864Z Running 22 items in this shard: test/test_futures.py::TestFuture::test_add_done_callback_error_is_ignored, test/test_futures.py::TestFuture::test_add_done_callback_maintains_callback_order, test/test_futures.py::TestFuture::test_add_done_callback_no_arg_error_is_ignored, test/test_futures.py::TestFuture::test_add_done_callback_simple, test/test_futures.py::TestFuture::test_chained_then, test/test_futures.py::TestFuture::test_collect_all, test/test_futures.py::TestFuture::test_done, test/test_futures.py::TestFuture::test_done_exception, test/test_futures.py::TestFuture::test_interleaving_then_and_add_done_callback_maintains_callback_order, test/test_futures.py::TestFuture::test_interleaving_then_and_add_done_callback_propagates_error, test/test_futures.py::TestFuture::test_mark_future_twice, test/test_futures.py::TestFuture::test_pickle_future, test/test_futures.py::TestFuture::test_set_exception, test/test_futures.py::TestFuture::test_set_exception_multithreading, test/test_futures.py::TestFuture::test_then, test/test_futures.py::TestFuture::test_then_no_arg, test/test_futures.py::TestFuture::test_then_raise, test/test_futures.py::TestFuture::test_then_wrong_arg, test/test_futures.py::TestFuture::test_wait, test/test_futures.py::TestFuture::test_wait_all, test/test_futures.py::TestFuture::test_wait_multi_thread, test/test_futures.py::TestFuture::test_wait_none 2025-08-14T23:32:45.7615184Z 2025-08-14T23:32:45.7615365Z Running test_fx_experimental 1/1 ... [2025-08-14 23:32:45.759312] 2025-08-14T23:32:45.7615748Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:32:45.7616696Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_experimental.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:32:45.759652] 2025-08-14T23:33:14.1184217Z 2025-08-14T23:33:14.1185285Z test_fx_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_experimental_1.1_119462db25ddab2e_.log 2025-08-14T23:33:14.1676516Z Running 723 items in this shard: test/test_fx_experimental.py::TestFXExperimental::test_annotate_getitem_node, test/test_fx_experimental.py::TestFXExperimental::test_annotate_returns_with_schema, test/test_fx_experimental.py::TestFXExperimental::test_aot_based_partition, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_no_msg, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_with_empty_msg, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_with_msg, test/test_fx_experimental.py::TestFXExperimental::test_call_to_assert_with_multiline_message, test/test_fx_experimental.py::TestFXExperimental::test_conv_bn_fusion, test/test_fx_experimental.py::TestFXExperimental::test_conv_bn_fusion_mixed_dtype, test/test_fx_experimental.py::TestFXExperimental::test_conv_bn_fusion_not_running_state, test/test_fx_experimental.py::TestFXExperimental::test_cost_aware_partition, test/test_fx_experimental.py::TestFXExperimental::test_fetch, test/test_fx_experimental.py::TestFXExperimental::test_find_single_partition, test/test_fx_experimental.py::TestFXExperimental::test_lack_of_devices, test/test_fx_experimental.py::TestFXExperimental::test_large_node_error, test/test_fx_experimental.py::TestFXExperimental::test_merge_matmuls, test/test_fx_experimental.py::TestFXExperimental::test_meta_tracer, test/test_fx_experimental.py::TestFXExperimental::test_normalize_args, test/test_fx_experimental.py::TestFXExperimental::test_normalize_args_perserve_type, test/test_fx_experimental.py::TestFXExperimental::test_normalize_args_preserve_meta, test/test_fx_experimental.py::TestFXExperimental::test_normalize_binary_operators, test/test_fx_experimental.py::TestFXExperimental::test_normalize_modules_exhaustive, test/test_fx_experimental.py::TestFXExperimental::test_optimize_for_inference_cpu, test/test_fx_experimental.py::TestFXExperimental::test_optimize_for_inference_cpu_torchvision, test/test_fx_experimental.py::TestFXExperimental::test_partition_device_mapping, test/test_fx_experimental.py::TestFXExperimental::test_partition_latency, test/test_fx_experimental.py::TestFXExperimental::test_partition_node_manipulation, test/test_fx_experimental.py::TestFXExperimental::test_replace_target_nodes_with, test/test_fx_experimental.py::TestFXExperimental::test_saturate_host, test/test_fx_experimental.py::TestFXExperimental::test_size_based_partition, test/test_fx_experimental.py::TestFXExperimental::test_sparse_nn_partition, test/test_fx_experimental.py::TestFXExperimental::test_split_module_dead_code, test/test_fx_experimental.py::TestFXExperimental::test_split_module_default_arg, test/test_fx_experimental.py::TestFXExperimental::test_split_module_input_names, test/test_fx_experimental.py::TestFXExperimental::test_split_module_keep_original_order_and_noop_graph, test/test_fx_experimental.py::TestFXExperimental::test_split_module_kwargs_expansion, test/test_fx_experimental.py::TestFXExperimental::test_split_module_return_node, test/test_fx_experimental.py::TestFXExperimental::test_split_module_symint_dependency_handling, test/test_fx_experimental.py::TestFXExperimental::test_split_qualname_mapping, test/test_fx_experimental.py::TestFXExperimental::test_subgraph_creation, test/test_fx_experimental.py::TestFXExperimental::test_subgraph_trivial_resnet, test/test_fx_experimental.py::TestFXExperimental::test_subgraph_uniquename, test/test_fx_experimental.py::TestFXExperimental::test_to_folder, test/test_fx_experimental.py::TestFXExperimental::test_traceable_function_with_nonstandard_name, test/test_fx_experimental.py::TestFXExperimental::test_type_matches, test/test_fx_experimental.py::TestTranslationValidation::test_sat, test/test_fx_experimental.py::TestTranslationValidation::test_sat_bitwise, test/test_fx_experimental.py::TestTranslationValidation::test_sympy_to_z3, test/test_fx_experimental.py::TestTranslationValidation::test_unsat, test/test_fx_experimental.py::TestTranslationValidation::test_z3str, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_args_op_overload_cuda, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_H_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_T_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___getitem___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___radd___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rdiv___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rmatmul___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rmod___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rmul___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rpow___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive___rsub___cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__chunk_cat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__softmax_backward_data_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive__upsample_bilinear2d_aa_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_abs_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_acos_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_acosh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_add_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addbmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addcdiv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addcmul_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addmm_decomposed_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addmv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_addr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_alias_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_all_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_allclose_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_aminmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_angle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_any_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_arange_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argsort_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_argwhere_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_as_strided_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_asin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_asinh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atan2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atan_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atanh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atleast_1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atleast_2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_atleast_3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_baddbmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bernoulli_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bfloat16_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_block_diag_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bool_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_broadcast_shapes_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_broadcast_tensors_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_broadcast_to_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_bucketize_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_byte_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cartesian_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cauchy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cdist_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cdouble_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ceil_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cfloat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_chalf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_char_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cholesky_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cholesky_inverse_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cholesky_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_chunk_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clamp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clamp_max_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clamp_min_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_clone_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_column_stack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_combinations_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_complex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_conj_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_conj_physical_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_constant_pad_nd_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_contiguous_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_copysign_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_corrcoef_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cos_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cosh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_count_nonzero_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cov_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cross_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cummax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cummin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cumprod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cumsum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_cumulative_trapezoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_deg2rad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diag_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diag_embed_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagflat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagonal_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagonal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diagonal_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_diff_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_digamma_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dist_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_div_floor_rounding_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_double_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dsplit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_dstack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_einsum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_permuted_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_empty_strided_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_eq_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_equal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_erf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_erfc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_erfinv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_exp2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_exp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expand_as_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expand_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expand_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_expm1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_exponential_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_eye_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_fftshift_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_hfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_hfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_hfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ifftshift_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ihfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ihfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_ihfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_irfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_irfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_irfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_rfft2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_rfft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fft_rfftn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fill_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_flatten_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_flip_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fliplr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_flipud_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_float_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_float_power_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_floor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_floor_divide_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_fmod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_frac_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_frexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_full_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_full_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_gather_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ge_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_geometric_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_geqrf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_gradient_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_grid_sampler_2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_gt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_half_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hash_tensor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_heaviside_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_histc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hsplit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hstack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_hypot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_i0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_igamma_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_igammac_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_add_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_fill_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_put_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_reduce_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_index_select_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_inner_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_int_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isclose_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isfinite_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isinf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isnan_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isneginf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isposinf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_isreal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_item_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_2inputs_2outputs_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_binary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_jiterator_unary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_kron_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_kthvalue_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ldexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_le_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lerp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lgamma_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cholesky_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cond_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_cross_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_det_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_diagonal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eig_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eigh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eigvals_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_householder_product_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_inv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_ldl_factor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lstsq_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_power_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_multi_dot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_pinv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_pinv_hermitian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_qr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_slogdet_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_svd_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_svdvals_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_vander_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_vecdot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linalg_vector_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linspace_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log10_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log1p_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_normal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_log_softmax_with_dtype_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logaddexp2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logaddexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logcumsumexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logdet_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_and_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_not_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_or_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logical_xor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logspace_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logspace_tensor_overload_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_logsumexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_long_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lu_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_lu_unpack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mH_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mT_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_argmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_argmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_cumprod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_cumsum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_fill_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_log_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_logaddexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_logsumexp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_median_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_normalize_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_select_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_softmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_std_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_sum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_masked_var_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_matmul_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_matrix_exp_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_binary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_pool2d_with_indices_backward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_reduction_no_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_maximum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_median_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_min_binary_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_minimum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mode_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_movedim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_msort_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mul_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_multinomial_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mv_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nan_to_num_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nanmean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nanmedian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nanquantile_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nansum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_narrow_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_narrow_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_native_batch_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_native_dropout_backward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_native_layer_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ne_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_neg_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_empty_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_empty_strided_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_full_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_ones_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_new_zeros_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nextafter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_avg_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_avg_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_celu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_conv_transpose3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_cosine_similarity_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_ctc_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_elu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_glu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardsigmoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardswish_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_bilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_l1_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_linear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_logsigmoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool1d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool2d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_mish_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_mse_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multi_head_attention_forward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_nll_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_replicate_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pad_replicate_negative_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pdist_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pixel_shuffle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_relu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_selu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_silu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softmin_with_dtype_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_unfold_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_upsample_bilinear_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nonzero_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_nonzero_static_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_fro_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_inf_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_norm_nuc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_normal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_normal_in_place_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_normal_number_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ones_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ones_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ormqr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_outer_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_pca_lowrank_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_permute_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_permute_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_pinverse_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polar_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_2_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_positive_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_pow_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_put_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_qr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_quantile_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rad2deg_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rand_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randint_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randint_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_randn_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_ravel_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_real_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_reciprocal_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_remainder_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_renorm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_repeat_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_repeat_interleave_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_reshape_as_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_reshape_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resize__cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resize_as__cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resolve_conj_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_resolve_neg_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_roll_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rot90_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_decimals_0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_decimals_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rsqrt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_rsub_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scalar_tensor_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_add_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_amax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_scatter_reduce_sum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_searchsorted_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_select_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_select_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sgn_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_short_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sigmoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sign_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_bartlett_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_general_hamming_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_hann_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_signbit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sin_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sinc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sinh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_slice_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_slice_scatter_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_softmax_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sort_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sparse_mm_reduce_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_airy_ai_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_j0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_j1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_y0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_bessel_y1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_v_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_entr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_erfcx_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_i0e_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_i1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_i1e_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_log_ndtr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_ndtr_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_ndtri_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_scaled_modified_bessel_k0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_scaled_modified_bessel_k1_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_spherical_bessel_j0_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_xlog1py_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_special_zeta_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_list_args_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_with_sizes_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_split_with_sizes_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sqrt_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_square_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_squeeze_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_squeeze_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_squeeze_multiple_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_stack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_mean_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_std_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_stft_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sub_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sum_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_sum_to_size_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_svd_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_svd_lowrank_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_t_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_t_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_take_along_dim_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_take_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tan_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tanh_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tensor_split_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tensordot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tile_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_to_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_to_sparse_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_topk_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trace_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_transpose_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_transpose_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trapezoid_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trapz_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_triangular_solve_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_tril_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_triu_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_true_divide_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_trunc_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unbind_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unbind_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unflatten_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unfold_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unfold_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_uniform_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unique_consecutive_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unique_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsafe_chunk_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsafe_split_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_unsqueeze_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_mean_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_mean_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_var_unbiased_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_vdot_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_as_complex_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_as_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_copy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_view_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_vsplit_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_vstack_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_where_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_xlogy_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_zero__cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_zeros_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_operator_exhaustive_zeros_like_cuda_float32, test/test_fx_experimental.py::TestNormalizeOperatorsCUDA::test_normalize_quantized_eb_cuda 2025-08-14T23:33:14.2183874Z 2025-08-14T23:33:14.2184128Z Running test_fx_passes 1/1 ... [2025-08-14 23:33:14.118935] 2025-08-14T23:33:14.2184657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:33:14.2186006Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_passes.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:33:14.119277] 2025-08-14T23:33:17.9417915Z 2025-08-14T23:33:17.9418998Z test_fx_passes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_passes_1.1_b17554a4255e7d59_.log 2025-08-14T23:33:17.9444302Z Running 53 items in this shard: test/test_fx_passes.py::TestFXGraphPasses::test_fuser_pass_deep_model, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition0, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition1, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition10, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition11, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition2, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition3, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition4, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition5, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition6, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition7, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition8, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition9, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition0, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition1, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition2, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition3, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn0_expected_partition0_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn10_expected_partition10_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn11_expected_partition11_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn12_expected_partition12_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn13_expected_partition13_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn14_expected_partition14_bookend_non_compute_pass_True, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn15_expected_partition15_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn16_expected_partition16_bookend_non_compute_pass_True, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn17_expected_partition17_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn18_expected_partition18_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn1_expected_partition1_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn2_expected_partition2_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn3_expected_partition3_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn4_expected_partition4_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn5_expected_partition5_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn6_expected_partition6_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn7_expected_partition7_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn8_expected_partition8_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn9_expected_partition9_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_independent_output_fn0_expected_partition0, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model0, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model1, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model10, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model11, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model12, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model13, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model14, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model15, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model2, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model3, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model4, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model5, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model6, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model7, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model8, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model9 2025-08-14T23:33:17.9476475Z 2025-08-14T23:33:17.9476829Z Running test_fx_reinplace_pass 1/1 ... [2025-08-14 23:33:17.941805] 2025-08-14T23:33:17.9477563Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:33:17.9479390Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_reinplace_pass.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:33:17.942348] 2025-08-14T23:33:21.9161007Z 2025-08-14T23:33:21.9162292Z test_fx_reinplace_pass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_reinplace_pass_1.1_91a7e6deb800af7c_.log 2025-08-14T23:33:21.9170494Z Running 12 items in this shard: test/test_fx_reinplace_pass.py::TestReinplacePass::test_out_node_updated, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_basic, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_different_metadata, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_index_mutation, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_overlapping_memory, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_op, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid2, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_valid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_sym_input, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_with_view 2025-08-14T23:33:21.9178006Z 2025-08-14T23:33:21.9178167Z Running test_hub 1/1 ... [2025-08-14 23:33:21.915839] 2025-08-14T23:33:21.9178502Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:33:21.9179404Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hub.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:33:21.916424] 2025-08-14T23:34:55.6457421Z 2025-08-14T23:34:55.6458772Z test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_binary_ufuncs_1.1_8fcc757f50be670a_.log 2025-08-14T23:34:56.0379688Z Running 12857 items in this shard: test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_broadcast_empty_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_with_tail_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addcmul_scalars_as_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addsub_half_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_edgecases_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_scalar_device_unspecified_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_ops_with_scalars_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bool_tensor_comparison_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cmul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpu_tensor_pow_cuda_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cremainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_binary_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_inplace_error_msg_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_csub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cuda_tensor_pow_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cumulative_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_script_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divmul_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_exceptions_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_idiv_and_ifloordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_division_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_dunders_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_and_float_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_tensor_pow_neg_ints_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_with_nontrivial_alignment_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_long_tensor_pow_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_forward_ad_float32_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_chalf_tensor_and_cpu_scalar_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_bfloat16_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_out_resize_warning_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_inplace_resizing_exception_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_base_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_overloads_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_overflow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rpow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_typing_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_tensor_pow_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___radd___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rand___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rdiv___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmod___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmul___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___ror___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rpow___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rsub___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rxor___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_return_by_ref_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_max_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_min_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_h_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_he_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_laguerre_polynomial_l_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_legendre_polynomial_p_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_bfloat16_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_gradients_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_uint8 2025-08-14T23:34:56.3740291Z 2025-08-14T23:34:56.3740450Z Running test_legacy_vmap 1/1 ... [2025-08-14 23:34:55.681440] 2025-08-14T23:34:56.3740756Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:34:56.3741097Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T23:34:56.3742230Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_legacy_vmap.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:34:55.682120] 2025-08-14T23:34:56.3743033Z Uploading artifacts took 0.00 seconds 2025-08-14T23:34:59.0904310Z 2025-08-14T23:34:59.0905199Z test_hub 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hub_1.1_5c8e6500c0267244_.log 2025-08-14T23:34:59.0908871Z Running 20 items in this shard: test/test_hub.py::TestHub::test_download_url_to_file, test/test_hub.py::TestHub::test_get_set_dir, test/test_hub.py::TestHub::test_hub_parse_repo_info, test/test_hub.py::TestHub::test_list_entrypoints, test/test_hub.py::TestHub::test_load_commit_from_forked_repo, test/test_hub.py::TestHub::test_load_from_branch, test/test_hub.py::TestHub::test_load_from_github, test/test_hub.py::TestHub::test_load_from_local_dir, test/test_hub.py::TestHub::test_load_legacy_zip_checkpoint, test/test_hub.py::TestHub::test_load_state_dict_from_url, test/test_hub.py::TestHub::test_load_zip_1_6_checkpoint, test/test_hub.py::TestHub::test_trust_repo_builtin_trusted_owners, test/test_hub.py::TestHub::test_trust_repo_check_no, test/test_hub.py::TestHub::test_trust_repo_check_yes, test/test_hub.py::TestHub::test_trust_repo_false_emptystring, test/test_hub.py::TestHub::test_trust_repo_false_no, test/test_hub.py::TestHub::test_trust_repo_legacy, test/test_hub.py::TestHub::test_trust_repo_none, test/test_hub.py::TestHub::test_trust_repo_true, test/test_hub.py::TestHub::test_trusted_repo_false_yes 2025-08-14T23:34:59.0912291Z 2025-08-14T23:34:59.0912611Z Running test_meta 3/4 ... [2025-08-14 23:34:59.090419] 2025-08-14T23:34:59.0913148Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:34:59.0914074Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '-m', 'not serial', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:34:59.090730] 2025-08-14T23:35:58.4270300Z 2025-08-14T23:35:58.4271565Z test_legacy_vmap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_legacy_vmap_1.1_bc9a64eb5fc32ef5_.log 2025-08-14T23:35:58.4308255Z Running 124 items in this shard: test/test_legacy_vmap.py::TestVmapAPILegacy::test_accepts_nested_inputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_backward_unsupported_interaction, test/test_legacy_vmap.py::TestVmapAPILegacy::test_batched_gradient_basic, test/test_legacy_vmap.py::TestVmapAPILegacy::test_constant_function, test/test_legacy_vmap.py::TestVmapAPILegacy::test_different_map_dim_size_raises, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_atan2, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_does_not_warn_by_default, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_masked_fill, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_multiple_returns, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_warns_when_warnings_are_enabled, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_with_undefined_grad, test/test_legacy_vmap.py::TestVmapAPILegacy::test_fallback_zero_dim, test/test_legacy_vmap.py::TestVmapAPILegacy::test_func_with_no_inputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_functools_partial, test/test_legacy_vmap.py::TestVmapAPILegacy::test_grad_unsupported_interaction, test/test_legacy_vmap.py::TestVmapAPILegacy::test_in_dim_not_in_tensor_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_in_dims_wrong_type_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_inplace_fallback_nary_different_levels, test/test_legacy_vmap.py::TestVmapAPILegacy::test_inplace_fallback_nary_same_levels, test/test_legacy_vmap.py::TestVmapAPILegacy::test_inplace_fallback_unary, test/test_legacy_vmap.py::TestVmapAPILegacy::test_integer_in_dim_but_not_tensor_input_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_inputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_outputs, test/test_legacy_vmap.py::TestVmapAPILegacy::test_multiple_outputs_error_cases, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_non_default_in_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_with_different_map_dim, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nested_with_same_map_dim, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nn_module, test/test_legacy_vmap.py::TestVmapAPILegacy::test_non_default_in_dims_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_non_tensor_output_raises, test/test_legacy_vmap.py::TestVmapAPILegacy::test_non_zero_in_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_none_in_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_nonzero_out_dims, test/test_legacy_vmap.py::TestVmapAPILegacy::test_noop_in_inner_vmap, test/test_legacy_vmap.py::TestVmapAPILegacy::test_not_enough_in_dims_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dim_out_of_bounds_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dims_and_num_outputs_mismatch_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dims_edge_case, test/test_legacy_vmap.py::TestVmapAPILegacy::test_out_dims_must_be_int_or_tuple_of_int_err_msg, test/test_legacy_vmap.py::TestVmapAPILegacy::test_single_input, test/test_legacy_vmap.py::TestVmapAPILegacy::test_unsupported_op_err_msg, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_T_numpy, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_as_strided, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_binary_pointwise_ops, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_bmm, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_cat, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_chunk, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_clamp, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_clone, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_comparison_ops, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_conj, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_contiguous, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_diagonal, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_dot, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_expand_as, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_fill_and_zero_inplace, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_imag, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_is_complex, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_is_contiguous, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_is_floating_point, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_mm, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_movedim, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_mv, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_narrow, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_new_empty, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_new_empty_strided, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_new_zeros, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_no_random_op_support, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_real, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_reshape, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_reshape_as, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_result_type, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_select, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_slice, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_split, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_squeeze, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_stack, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_stride, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_sum_dim, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_t, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_tensor_split, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_to, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_trace, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_transpose, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_unary_pointwise_ops, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_unbind, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_unfold, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view_as, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view_as_complex, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_view_as_real, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_vmap_fallback_check, test/test_legacy_vmap.py::TestVmapOperatorsLegacy::test_vmap_fallback_check_ok, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_add_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_binary_cross_entropy_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_diagonal_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_div_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_expand_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_index_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_inplace_manyview_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_inplace_on_view_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_lgamma_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_log1p_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_log_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_logsumexp_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_max_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_median_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_min_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_mul_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_permute_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_reshape_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_select_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_sigmoid_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_slice_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_stack_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_sub_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_threshold_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_trace_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_unrelated_output_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_unrelated_output_multiple_grad_cuda, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_vmap_fallback_check, test/test_legacy_vmap.py::TestVmapBatchedGradientLegacyCUDA::test_vmap_fallback_check_ok 2025-08-14T23:35:58.4361535Z 2025-08-14T23:35:58.4361850Z Running test_sparse_csr 3/3 ... [2025-08-14 23:35:58.426457] 2025-08-14T23:35:58.4362525Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:35:58.4364671Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_csr.py', '-m', 'not serial', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:35:58.426788] 2025-08-14T23:40:40.7905795Z 2025-08-14T23:40:40.7906446Z PRINTING LOG FILE of test_sparse_csr 3/3 (test/test-reports/test_sparse_csr_3.3_febce375c1d07f5e_.log) 2025-08-14T23:40:40.7908494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:40:40.7911289Z import pkg_resources 2025-08-14T23:40:40.7912238Z Test results will be stored in test-reports/python-pytest/test_sparse_csr/test_sparse_csr-76f9fd1467c07893.xml 2025-08-14T23:40:40.7913339Z ============================= test session starts ============================== 2025-08-14T23:40:40.7914313Z platform linux -- Python 3.10.18, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-08-14T23:40:40.7915177Z cachedir: .pytest_cache 2025-08-14T23:40:40.7915905Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-08-14T23:40:40.7916479Z rootdir: /var/lib/jenkins/pytorch 2025-08-14T23:40:40.7916750Z configfile: pytest.ini 2025-08-14T23:40:40.7917282Z plugins: rerunfailures-14.0, flakefinder-1.1.0, xdist-3.3.1, hypothesis-5.35.1, cpp-2.3.0, xdoctest-1.1.0, subtests-0.13.1, typeguard-4.3.0 2025-08-14T23:40:40.7917869Z collecting ... collected 4958 items 2025-08-14T23:40:40.7918199Z stepcurrent: Cannot find last run test, not skipping 2025-08-14T23:40:40.8711865Z Running 1666 items in this shard: test/test_sparse_csr.py::TestSparseCSRSampler::test_make_crow_indices, test/test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_11x9_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_5x7_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_dense_output_mm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_angle_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSC_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSC_SparseCSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSR_SparseBSR_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_is_contiguous_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_nnz_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_csr_to_block_csr_blocksize_2_cuda_float64_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_Batched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_NonHybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSR_Batched_Hybrid_cuda, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_linalg_solve_sparse_csr_cusolver_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_to_sparse_compressed_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_triangular_solve_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_bool, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSR_target_sparse_compressed_tensor_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_csr_large_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_layout_SparseBSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_print_SparseCSC_cuda, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_float64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_uint8, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_bool, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_complex128, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int32, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_complex64, test/test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_int16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_TensorAsKey_cuda, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int64_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_error_messages_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16x32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2x3_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_64_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_softmax_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_unspecified_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_unspecified_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_32_cuda_float16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_64_cuda_float32, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scatter_mm_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_int32_cuda_int8, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_unspecified_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_bfloat16, test/test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_float16 2025-08-14T23:40:40.9160231Z 2025-08-14T23:40:40.9160519Z test_sparse_csr.py::TestSparseCSRSampler::test_make_crow_indices PASSED [0.3731s] [ 0%] 2025-08-14T23:40:40.9161021Z test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSC_cuda_float32 PASSED [0.5919s] [ 0%] 2025-08-14T23:40:40.9161512Z test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSR_cuda_float32 PASSED [0.1546s] [ 0%] 2025-08-14T23:40:40.9161995Z test_sparse_csr.py::TestSparseCSRCUDA::test_add_SparseCSR_cuda_float64 PASSED [0.1560s] [ 0%] 2025-08-14T23:40:40.9162546Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSC_cuda_float64 PASSED [1.3329s] [ 0%] 2025-08-14T23:40:40.9163220Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_float32 PASSED [0.0473s] [ 0%] 2025-08-14T23:40:40.9163889Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCSR_cuda_float64 PASSED [0.0472s] [ 0%] 2025-08-14T23:40:40.9164706Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_dense_result_SparseCSR_cuda_complex128 SKIPPED [0.0010s] (Only runs on cpu) [ 0%] 2025-08-14T23:40:40.9165356Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float64 PASSED [0.1125s] [ 0%] 2025-08-14T23:40:40.9166038Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float32 PASSED [0.0043s] [ 0%] 2025-08-14T23:40:40.9166647Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex128 PASSED [0.0040s] [ 0%] 2025-08-14T23:40:40.9167253Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float64 PASSED [0.0037s] [ 0%] 2025-08-14T23:40:40.9167854Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex64 PASSED [0.0064s] [ 0%] 2025-08-14T23:40:40.9168456Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float32 PASSED [0.0043s] [ 0%] 2025-08-14T23:40:40.9169047Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float32 PASSED [0.0038s] [ 0%] 2025-08-14T23:40:40.9169706Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float64 PASSED [0.0038s] [ 0%] 2025-08-14T23:40:40.9170312Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex64 PASSED [0.0044s] [ 1%] 2025-08-14T23:40:40.9170914Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float32 PASSED [0.0044s] [ 1%] 2025-08-14T23:40:40.9171506Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float64 PASSED [0.0044s] [ 1%] 2025-08-14T23:40:40.9172166Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex64 PASSED [0.0039s] [ 1%] 2025-08-14T23:40:40.9172764Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float64 PASSED [0.0038s] [ 1%] 2025-08-14T23:40:40.9173355Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float64 PASSED [0.0040s] [ 1%] 2025-08-14T23:40:40.9173952Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float32 PASSED [0.0050s] [ 1%] 2025-08-14T23:40:40.9174556Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex64 PASSED [0.0056s] [ 1%] 2025-08-14T23:40:40.9175162Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float32 PASSED [0.0050s] [ 1%] 2025-08-14T23:40:40.9175766Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float64 PASSED [0.0050s] [ 1%] 2025-08-14T23:40:40.9176369Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex64 PASSED [0.0041s] [ 1%] 2025-08-14T23:40:40.9176965Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float32 PASSED [0.0049s] [ 1%] 2025-08-14T23:40:40.9177557Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float64 PASSED [0.0049s] [ 1%] 2025-08-14T23:40:40.9178159Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex128 PASSED [0.0038s] [ 1%] 2025-08-14T23:40:40.9178757Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float32 PASSED [0.0037s] [ 1%] 2025-08-14T23:40:40.9179350Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float64 PASSED [0.0039s] [ 1%] 2025-08-14T23:40:40.9179950Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex128 PASSED [0.0040s] [ 1%] 2025-08-14T23:40:40.9180651Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex64 PASSED [0.0029s] [ 2%] 2025-08-14T23:40:40.9181443Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float64 PASSED [0.0028s] [ 2%] 2025-08-14T23:40:40.9182105Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex128 PASSED [0.0031s] [ 2%] 2025-08-14T23:40:40.9182706Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float32 PASSED [0.0029s] [ 2%] 2025-08-14T23:40:40.9183363Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float64 PASSED [0.0030s] [ 2%] 2025-08-14T23:40:40.9183963Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex128 PASSED [0.0061s] [ 2%] 2025-08-14T23:40:40.9184564Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float32 PASSED [0.0052s] [ 2%] 2025-08-14T23:40:40.9185171Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex64 PASSED [0.0031s] [ 2%] 2025-08-14T23:40:40.9185772Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float64 PASSED [0.0030s] [ 2%] 2025-08-14T23:40:40.9186436Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex128 PASSED [0.0046s] [ 2%] 2025-08-14T23:40:40.9187051Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float32 PASSED [0.0041s] [ 2%] 2025-08-14T23:40:40.9187649Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float64 PASSED [0.0041s] [ 2%] 2025-08-14T23:40:40.9188254Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_11x9_cuda_float32 SKIPPED [0.0008s] (Only runs on cpu) [ 2%] 2025-08-14T23:40:40.9188851Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_float32 SKIPPED [0.0008s] (Only runs on cpu) [ 2%] 2025-08-14T23:40:40.9189493Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_3x3_cuda_float64 SKIPPED [0.0008s] (Only runs on cpu) [ 2%] 2025-08-14T23:40:40.9190080Z test_sparse_csr.py::TestSparseCSRCUDA::test_addmv_shape_5x7_cuda_float64 SKIPPED [0.0008s] (Only runs on cpu) [ 2%] 2025-08-14T23:40:40.9190658Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_dense_output_mm_cuda_float64 PASSED [0.2081s] [ 3%] 2025-08-14T23:40:40.9191227Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_abs_cuda_complex128 PASSED [0.0672s] [ 3%] 2025-08-14T23:40:40.9191994Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_angle_cuda_complex128 SKIPPED [0.0017s] (Skipped! Unary op angle not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9192932Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asinh_cuda_complex128 SKIPPED [0.0017s] (Skipped! Unary op asinh not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9193861Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_asinh_cuda_float64 SKIPPED [0.0017s] (Skipped! Unary op asinh not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9194783Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atan_cuda_complex128 SKIPPED [0.0016s] (Skipped! Unary op atan not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9195707Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_atanh_cuda_complex128 SKIPPED [0.0017s] (Skipped! Unary op atanh not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9196503Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_conj_physical_cuda_complex128 PASSED [0.0118s] [ 3%] 2025-08-14T23:40:40.9197293Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_erfinv_cuda_float64 SKIPPED [0.0023s] (Skipped! Unary op erfinv not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9198226Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_expm1_cuda_complex128 SKIPPED [0.0016s] (Skipped! Unary op expm1 not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9199213Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_expm1_cuda_float64 SKIPPED [0.0016s] (Skipped! Unary op expm1 not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9200122Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_floor_cuda_float64 SKIPPED [0.0017s] (Skipped! Unary op floor not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9201151Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isinf_cuda_complex128 SKIPPED [0.0018s] (Skipped! Unary op isinf not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9202074Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isinf_cuda_float64 SKIPPED [0.0016s] (Skipped! Unary op isinf not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9203007Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_isneginf_cuda_float64 SKIPPED [0.0016s] (Skipped! Unary op isneginf not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9203845Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_neg_cuda_complex128 PASSED [0.0072s] [ 3%] 2025-08-14T23:40:40.9204606Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sinh_cuda_complex128 SKIPPED [0.0016s] (Skipped! Unary op sinh not supported with CSR input and autograd) [ 3%] 2025-08-14T23:40:40.9205518Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_sqrt_cuda_float64 SKIPPED [0.0016s] (Skipped! Unary op sqrt not supported with CSR input and autograd) [ 4%] 2025-08-14T23:40:40.9206424Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tan_cuda_complex128 SKIPPED [0.0016s] (Skipped! Unary op tan not supported with CSR input and autograd) [ 4%] 2025-08-14T23:40:40.9207404Z test_sparse_csr.py::TestSparseCSRCUDA::test_autograd_sparse_csr_unary_tanh_cuda_complex128 SKIPPED [0.0018s] (Skipped! Unary op tanh not supported with CSR input and autograd) [ 4%] 2025-08-14T23:40:40.9208215Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float16 PASSED [0.7505s] [ 4%] 2025-08-14T23:40:40.9208883Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float64 PASSED [0.1050s] [ 4%] 2025-08-14T23:40:40.9209551Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float32 PASSED [0.1066s] [ 4%] 2025-08-14T23:40:40.9210227Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex128 PASSED [0.0839s] [ 4%] 2025-08-14T23:40:40.9210898Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float64 PASSED [0.0777s] [ 4%] 2025-08-14T23:40:40.9211572Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float16 PASSED [0.0890s] [ 4%] 2025-08-14T23:40:40.9212248Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex64 PASSED [0.1006s] [ 4%] 2025-08-14T23:40:40.9212921Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float16 PASSED [0.0893s] [ 4%] 2025-08-14T23:40:40.9213587Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_bfloat16 PASSED [0.0900s] [ 4%] 2025-08-14T23:40:40.9214255Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex64 PASSED [0.0848s] [ 4%] 2025-08-14T23:40:40.9214929Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex64 PASSED [0.1302s] [ 4%] 2025-08-14T23:40:40.9215599Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float32 PASSED [0.0086s] [ 4%] 2025-08-14T23:40:40.9216351Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex128 PASSED [0.0123s] [ 4%] 2025-08-14T23:40:40.9217034Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex64 PASSED [0.0116s] [ 4%] 2025-08-14T23:40:40.9217708Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float64 PASSED [0.0085s] [ 5%] 2025-08-14T23:40:40.9218444Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex128 PASSED [0.0911s] [ 5%] 2025-08-14T23:40:40.9219123Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex128 PASSED [0.0143s] [ 5%] 2025-08-14T23:40:40.9219791Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float64 PASSED [0.0085s] [ 5%] 2025-08-14T23:40:40.9220499Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex64 PASSED [0.3578s] [ 5%] 2025-08-14T23:40:40.9221306Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float32 PASSED [0.0440s] [ 5%] 2025-08-14T23:40:40.9222042Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float64 PASSED [0.0446s] [ 5%] 2025-08-14T23:40:40.9222781Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex128 PASSED [0.0483s] [ 5%] 2025-08-14T23:40:40.9223516Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float32 PASSED [0.0463s] [ 5%] 2025-08-14T23:40:40.9224250Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex128 PASSED [0.0443s] [ 5%] 2025-08-14T23:40:40.9225043Z test_sparse_csr.py::TestSparseCSRCUDA::test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float64 PASSED [0.0428s] [ 5%] 2025-08-14T23:40:40.9225629Z test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_float32 PASSED [0.0147s] [ 5%] 2025-08-14T23:40:40.9226202Z test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseBSC_SparseBSR_cuda PASSED [0.0067s] [ 5%] 2025-08-14T23:40:40.9226897Z test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSC_SparseCSR_cuda PASSED [0.0052s] [ 5%] 2025-08-14T23:40:40.9227591Z test_sparse_csr.py::TestSparseCSRCUDA::test_compressed_layout_conversions_coverage_SparseCSR_SparseBSR_cuda PASSED [0.0026s] [ 5%] 2025-08-14T23:40:40.9228200Z test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_complex64 PASSED [0.0243s] [ 5%] 2025-08-14T23:40:40.9228717Z test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_float64 PASSED [0.0054s] [ 6%] 2025-08-14T23:40:40.9229212Z test_sparse_csr.py::TestSparseCSRCUDA::test_coo_csr_conversion_cuda_int64 PASSED [0.0046s] [ 6%] 2025-08-14T23:40:40.9229708Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_bool PASSED [0.0099s] [ 6%] 2025-08-14T23:40:40.9230218Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_complex128 PASSED [0.0056s] [ 6%] 2025-08-14T23:40:40.9230741Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_complex64 PASSED [0.0056s] [ 6%] 2025-08-14T23:40:40.9231250Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int32 PASSED [0.0044s] [ 6%] 2025-08-14T23:40:40.9231742Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_coo_conversion_cuda_int8 PASSED [0.0044s] [ 6%] 2025-08-14T23:40:40.9232222Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_is_contiguous_cuda PASSED [0.0019s] [ 6%] 2025-08-14T23:40:40.9232820Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_bfloat16 SKIPPED [0.0008s] (ROCm doesn't work with half dtypes correctly.) [ 6%] 2025-08-14T23:40:40.9233485Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_matvec_cuda_complex64 PASSED [0.1508s] [ 6%] 2025-08-14T23:40:40.9233982Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_nnz_cuda SKIPPED [0.0010s] (Only runs on cpu) [ 6%] 2025-08-14T23:40:40.9234535Z test_sparse_csr.py::TestSparseCSRCUDA::test_csr_to_block_csr_blocksize_2_cuda_float64_int32 PASSED [0.0170s] [ 6%] 2025-08-14T23:40:40.9235237Z test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_NonHybrid_cuda PASSED [0.0154s] [ 6%] 2025-08-14T23:40:40.9235928Z test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_Batched_Hybrid_cuda PASSED [0.1236s] [ 6%] 2025-08-14T23:40:40.9236602Z test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_Hybrid_cuda PASSED [0.0082s] [ 6%] 2025-08-14T23:40:40.9237297Z test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_NonHybrid_cuda PASSED [0.0052s] [ 6%] 2025-08-14T23:40:40.9237980Z test_sparse_csr.py::TestSparseCSRCUDA::test_dense_to_from_sparse_compressed_SparseCSR_Batched_Hybrid_cuda PASSED [0.1211s] [ 6%] 2025-08-14T23:40:40.9238654Z test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_complex64 PASSED [0.0060s] [ 7%] 2025-08-14T23:40:40.9239216Z test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_float32 PASSED [0.0056s] [ 7%] 2025-08-14T23:40:40.9239757Z test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_int32 PASSED [0.0047s] [ 7%] 2025-08-14T23:40:40.9240292Z test_sparse_csr.py::TestSparseCSRCUDA::test_direct_coo_csr_conversion_cuda_uint8 PASSED [0.0047s] [ 7%] 2025-08-14T23:40:40.9240865Z test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_float16 PASSED [0.0034s] [ 7%] 2025-08-14T23:40:40.9241360Z test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int16 PASSED [0.0031s] [ 7%] 2025-08-14T23:40:40.9241913Z test_sparse_csr.py::TestSparseCSRCUDA::test_exercise_detach_cuda_int8 PASSED [0.0031s] [ 7%] 2025-08-14T23:40:40.9242524Z test_sparse_csr.py::TestSparseCSRCUDA::test_linalg_solve_sparse_csr_cusolver_cuda_float32 SKIPPED [0.0002s] (The test requires cudss) [ 7%] 2025-08-14T23:40:40.9243110Z test_sparse_csr.py::TestSparseCSRCUDA::test_mm_cuda_float64 PASSED [1.1032s] [ 7%] 2025-08-14T23:40:40.9243541Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_cuda_float64 PASSED [0.3000s] [ 7%] 2025-08-14T23:40:40.9244081Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bfloat16 PASSED [0.9504s] [ 7%] 2025-08-14T23:40:40.9244706Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bool PASSED [0.5199s] [ 7%] 2025-08-14T23:40:40.9245331Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex64 PASSED [1.0832s] [ 7%] 2025-08-14T23:40:40.9245964Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float32 PASSED [0.9520s] [ 7%] 2025-08-14T23:40:40.9246585Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int32 PASSED [0.6701s] [ 7%] 2025-08-14T23:40:40.9247200Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bool PASSED [0.5171s] [ 7%] 2025-08-14T23:40:40.9247814Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float32 PASSED [0.9437s] [ 7%] 2025-08-14T23:40:40.9248425Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int32 PASSED [0.6679s] [ 8%] 2025-08-14T23:40:40.9249030Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int64 PASSED [0.6694s] [ 8%] 2025-08-14T23:40:40.9249651Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex64 PASSED [1.0687s] [ 8%] 2025-08-14T23:40:40.9250276Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float16 PASSED [0.8456s] [ 8%] 2025-08-14T23:40:40.9250961Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int16 PASSED [0.6629s] [ 8%] 2025-08-14T23:40:40.9251573Z test_sparse_csr.py::TestSparseCSRCUDA::test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float32 PASSED [0.9332s] [ 8%] 2025-08-14T23:40:40.9252243Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseBSC_cuda_bool PASSED [0.0269s] [ 8%] 2025-08-14T23:40:40.9252837Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSC_cuda_bool PASSED [0.0390s] [ 8%] 2025-08-14T23:40:40.9253441Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_as_sparse_compressed_SparseCSC_cuda_float32 PASSED [0.0390s] [ 8%] 2025-08-14T23:40:40.9253965Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int64 PASSED [0.0034s] [ 8%] 2025-08-14T23:40:40.9254406Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_cuda_int8 PASSED [0.0034s] [ 8%] 2025-08-14T23:40:40.9254869Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_bfloat16 PASSED [0.0026s] [ 8%] 2025-08-14T23:40:40.9255435Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_complex128 PASSED [0.0023s] [ 8%] 2025-08-14T23:40:40.9255936Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_float16 PASSED [0.0023s] [ 8%] 2025-08-14T23:40:40.9256415Z test_sparse_csr.py::TestSparseCSRCUDA::test_resize_errors_cuda_int8 PASSED [0.0023s] [ 8%] 2025-08-14T23:40:40.9256919Z test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_complex64 PASSED [0.0029s] [ 8%] 2025-08-14T23:40:40.9257446Z test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_errors_cuda_float32 PASSED [0.0023s] [ 9%] 2025-08-14T23:40:40.9257989Z test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_complex128 PASSED [0.0026s] [ 9%] 2025-08-14T23:40:40.9258595Z test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_float32 PASSED [0.0029s] [ 9%] 2025-08-14T23:40:40.9259558Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_bool :0:rocdevice.cpp :2992: 290120842614 us: Callback: Queue 0x7f8f71600000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 2025-08-14T23:40:40.9260359Z Fatal Python error: Aborted 2025-08-14T23:40:40.9260491Z 2025-08-14T23:40:40.9260595Z Thread 0x00007f8f4e5ff640 (most recent call first): 2025-08-14T23:40:40.9260844Z 2025-08-14T23:40:40.9260954Z 2025-08-14T23:40:40.9261048Z Thread 0x00007f9454e51440 (most recent call first): 2025-08-14T23:40:40.9261583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3614 in random_sparse_compressed 2025-08-14T23:40:40.9262309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3629 in 2025-08-14T23:40:40.9263031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3629 in genSparseCompressedTensor 2025-08-14T23:40:40.9263638Z File "/var/lib/jenkins/pytorch/test/test_sparse_csr.py", line 1104 in test_select 2025-08-14T23:40:40.9264206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 698 in test_wrapper 2025-08-14T23:40:40.9264919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 426 in instantiated_test 2025-08-14T23:40:40.9265617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3224 in wrapper 2025-08-14T23:40:40.9266266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3224 in wrapper 2025-08-14T23:40:40.9266836Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549 in _callTestMethod 2025-08-14T23:40:40.9267346Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591 in run 2025-08-14T23:40:40.9267892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3376 in _run_custom 2025-08-14T23:40:40.9268548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3406 in run 2025-08-14T23:40:40.9269269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 517 in run 2025-08-14T23:40:40.9269816Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 650 in __call__ 2025-08-14T23:40:40.9270294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/unittest.py", line 333 in runtest 2025-08-14T23:40:40.9270847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call 2025-08-14T23:40:40.9271404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall 2025-08-14T23:40:40.9271935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec 2025-08-14T23:40:40.9272506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__ 2025-08-14T23:40:40.9273022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 262 in 2025-08-14T23:40:40.9273534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call 2025-08-14T23:40:40.9274076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 261 in call_runtest_hook 2025-08-14T23:40:40.9274626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 222 in call_and_report 2025-08-14T23:40:40.9275229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 133 in runtestprotocol 2025-08-14T23:40:40.9275853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pytest_rerunfailures.py", line 549 in pytest_runtest_protocol 2025-08-14T23:40:40.9276451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall 2025-08-14T23:40:40.9276984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec 2025-08-14T23:40:40.9277500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__ 2025-08-14T23:40:40.9278037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop 2025-08-14T23:40:40.9278581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall 2025-08-14T23:40:40.9279105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec 2025-08-14T23:40:40.9279626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__ 2025-08-14T23:40:40.9280125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 323 in _main 2025-08-14T23:40:40.9280675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 269 in wrap_session 2025-08-14T23:40:40.9281221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main 2025-08-14T23:40:40.9281773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall 2025-08-14T23:40:40.9282298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec 2025-08-14T23:40:40.9282816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__ 2025-08-14T23:40:40.9283347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/config/__init__.py", line 166 in main 2025-08-14T23:40:40.9284030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1299 in run_tests 2025-08-14T23:40:40.9284579Z File "/var/lib/jenkins/pytorch/test/test_sparse_csr.py", line 4334 in 2025-08-14T23:40:40.9284824Z 2025-08-14T23:40:40.9296353Z Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, thriftpy2.transport.cybase, thriftpy2.transport.buffered.cybuffered, thriftpy2.transport.framed.cyframed, thriftpy2.transport.memory.cymemory, thriftpy2.transport.sasl.cysasl, thriftpy2.protocol.cybin.cybin, yaml._yaml, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, scipy._lib._ccallback_c, numba.mviewbuf, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ellip_harm_2, scipy.signal._sigtools, scipy.signal._max_len_seq_inner, scipy.signal._upfirdn_apply, scipy.signal._spline, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.signal._sosfilt, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.signal._spectral, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy.stats._statlib, scipy.stats._mvn, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, scipy.signal._peak_finding_utils (total: 159) 2025-08-14T23:40:40.9308109Z Got exit code -6 (SIGIOT) 2025-08-14T23:40:40.9308312Z Retrying single test... 2025-08-14T23:40:40.9309278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:40:40.9310261Z import pkg_resources 2025-08-14T23:40:40.9310645Z Test results will be stored in test-reports/python-pytest/test_sparse_csr/test_sparse_csr-d847c28fe0f8301d.xml 2025-08-14T23:40:40.9311092Z ============================= test session starts ============================== 2025-08-14T23:40:40.9311549Z platform linux -- Python 3.10.18, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-08-14T23:40:40.9311910Z cachedir: .pytest_cache 2025-08-14T23:40:40.9312337Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-08-14T23:40:40.9312795Z rootdir: /var/lib/jenkins/pytorch 2025-08-14T23:40:40.9313011Z configfile: pytest.ini 2025-08-14T23:40:40.9313446Z plugins: rerunfailures-14.0, flakefinder-1.1.0, xdist-3.3.1, hypothesis-5.35.1, cpp-2.3.0, xdoctest-1.1.0, subtests-0.13.1, typeguard-4.3.0 2025-08-14T23:40:40.9313983Z collecting ... collected 4958 items / 1665 deselected / 3293 selected 2025-08-14T23:40:40.9314599Z stepcurrent: skipping 152 already run items. Running only test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_bool 2025-08-14T23:40:40.9315093Z Running 1 items in this shard 2025-08-14T23:40:40.9315225Z 2025-08-14T23:40:40.9315737Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_bool [W814 23:36:27.299522915 Module.cpp:192] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-08-14T23:40:40.9316303Z 2025-08-14T23:40:40.9316605Z [W814 23:36:36.294491917 Module.cpp:192] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-08-14T23:40:40.9316972Z 2025-08-14T23:40:40.9317267Z [W814 23:36:36.354314673 Module.cpp:192] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-08-14T23:40:40.9317627Z 2025-08-14T23:40:40.9317708Z PASSED [9.5101s] [100%] 2025-08-14T23:40:40.9317821Z 2025-08-14T23:40:40.9318187Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_sparse_csr/test_sparse_csr-d847c28fe0f8301d.xml - 2025-08-14T23:40:40.9318747Z ====================== 1 passed, 1665 deselected in 9.75s ====================== 2025-08-14T23:40:40.9319014Z Got exit code 0 2025-08-14T23:40:40.9319269Z Test succeeeded in new process, continuing with the rest of the tests 2025-08-14T23:40:40.9320364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:40:40.9321340Z import pkg_resources 2025-08-14T23:40:40.9321718Z Test results will be stored in test-reports/python-pytest/test_sparse_csr/test_sparse_csr-6893d71aee8f1add.xml 2025-08-14T23:40:40.9322171Z ============================= test session starts ============================== 2025-08-14T23:40:40.9322652Z platform linux -- Python 3.10.18, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-08-14T23:40:40.9323013Z cachedir: .pytest_cache 2025-08-14T23:40:40.9323438Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-08-14T23:40:40.9323897Z rootdir: /var/lib/jenkins/pytorch 2025-08-14T23:40:40.9324181Z configfile: pytest.ini 2025-08-14T23:40:40.9324593Z plugins: rerunfailures-14.0, flakefinder-1.1.0, xdist-3.3.1, hypothesis-5.35.1, cpp-2.3.0, xdoctest-1.1.0, subtests-0.13.1, typeguard-4.3.0 2025-08-14T23:40:40.9325122Z collecting ... collected 4958 items / 153 deselected / 4805 selected 2025-08-14T23:40:40.9325432Z stepcurrent: skipping 153 already run items. 2025-08-14T23:40:40.9325677Z Running 1513 items in this shard 2025-08-14T23:40:40.9325816Z 2025-08-14T23:40:40.9326051Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_float64 PASSED [0.6370s] [ 0%] 2025-08-14T23:40:40.9326605Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_bfloat16 PASSED [0.0381s] [ 0%] 2025-08-14T23:40:40.9327121Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_bool PASSED [0.0192s] [ 0%] 2025-08-14T23:40:40.9327724Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_complex128 PASSED [0.0136s] [ 0%] 2025-08-14T23:40:40.9328258Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int64_cuda_int64 PASSED [0.0108s] [ 0%] 2025-08-14T23:40:40.9328782Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_complex128 PASSED [0.0097s] [ 0%] 2025-08-14T23:40:40.9329312Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_float32 PASSED [0.0096s] [ 0%] 2025-08-14T23:40:40.9329827Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int16 PASSED [0.0087s] [ 0%] 2025-08-14T23:40:40.9330401Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int32_cuda_int64 PASSED [0.0086s] [ 0%] 2025-08-14T23:40:40.9330918Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_bool PASSED [0.0086s] [ 0%] 2025-08-14T23:40:40.9331431Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_float64 PASSED [0.0092s] [ 0%] 2025-08-14T23:40:40.9331952Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int16 PASSED [0.0086s] [ 0%] 2025-08-14T23:40:40.9332461Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_int32 PASSED [0.0085s] [ 0%] 2025-08-14T23:40:40.9332974Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSR_int64_cuda_uint8 PASSED [0.0087s] [ 0%] 2025-08-14T23:40:40.9333496Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_complex128 PASSED [0.0580s] [ 0%] 2025-08-14T23:40:40.9334021Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int32_cuda_int16 PASSED [0.0163s] [ 1%] 2025-08-14T23:40:40.9334544Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_bfloat16 PASSED [0.0169s] [ 1%] 2025-08-14T23:40:40.9335068Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_bool PASSED [0.0162s] [ 1%] 2025-08-14T23:40:40.9335585Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_complex128 PASSED [0.0169s] [ 1%] 2025-08-14T23:40:40.9336121Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_complex64 PASSED [0.0185s] [ 1%] 2025-08-14T23:40:40.9336647Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_float64 PASSED [0.0168s] [ 1%] 2025-08-14T23:40:40.9337164Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int16 PASSED [0.0163s] [ 1%] 2025-08-14T23:40:40.9337674Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int32 PASSED [0.0164s] [ 1%] 2025-08-14T23:40:40.9338181Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSC_int64_cuda_int64 PASSED [0.0163s] [ 1%] 2025-08-14T23:40:40.9338691Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int16 PASSED [0.0121s] [ 1%] 2025-08-14T23:40:40.9339291Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_int8 PASSED [0.0120s] [ 1%] 2025-08-14T23:40:40.9339807Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int32_cuda_uint8 PASSED [0.0120s] [ 1%] 2025-08-14T23:40:40.9340316Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_bool PASSED [0.0120s] [ 1%] 2025-08-14T23:40:40.9340901Z test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseCSR_int64_cuda_float32 PASSED [0.0126s] [ 1%] 2025-08-14T23:40:40.9341408Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_complex64 PASSED [0.3735s] [ 1%] 2025-08-14T23:40:40.9341885Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_cuda_float64 PASSED [0.3379s] [ 2%] 2025-08-14T23:40:40.9342374Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_complex128 PASSED [0.0052s] [ 2%] 2025-08-14T23:40:40.9342887Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_add_errors_cuda_float64 PASSED [0.0039s] [ 2%] 2025-08-14T23:40:40.9343380Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_complex128 PASSED [1.1870s] [ 2%] 2025-08-14T23:40:40.9343922Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_float64 PASSED [0.0731s] [ 2%] 2025-08-14T23:40:40.9344410Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_bool PASSED [0.0192s] [ 2%] 2025-08-14T23:40:40.9344910Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_float16 PASSED [0.0165s] [ 2%] 2025-08-14T23:40:40.9345411Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_float32 PASSED [0.0064s] [ 2%] 2025-08-14T23:40:40.9345912Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int16 PASSED [0.0054s] [ 2%] 2025-08-14T23:40:40.9346403Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int32 PASSED [0.0054s] [ 2%] 2025-08-14T23:40:40.9346939Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csc_to_dense_cuda_int64 PASSED [0.0054s] [ 2%] 2025-08-14T23:40:40.9347450Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-08-14T23:40:40.9347976Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_complex128 PASSED [0.0029s] [ 2%] 2025-08-14T23:40:40.9348498Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_int64 PASSED [0.0026s] [ 2%] 2025-08-14T23:40:40.9349003Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_from_dense_cuda_uint8 PASSED [0.0026s] [ 2%] 2025-08-14T23:40:40.9349500Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_bool PASSED [0.0052s] [ 3%] 2025-08-14T23:40:40.9350002Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_complex64 PASSED [0.0065s] [ 3%] 2025-08-14T23:40:40.9350505Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_float16 PASSED [0.0062s] [ 3%] 2025-08-14T23:40:40.9351008Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_int16 PASSED [0.0051s] [ 3%] 2025-08-14T23:40:40.9351502Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_to_dense_cuda_uint8 PASSED [0.0051s] [ 3%] 2025-08-14T23:40:40.9352033Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_complex32 PASSED [0.0779s] [ 3%] 2025-08-14T23:40:40.9352595Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_float16 PASSED [0.0049s] [ 3%] 2025-08-14T23:40:40.9353145Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_abs_cuda_int64 PASSED [0.0046s] [ 3%] 2025-08-14T23:40:40.9353806Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_bool SKIPPED [0.0037s] (Skipped! Inplace variant not supported!) [ 3%] 2025-08-14T23:40:40.9354592Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_complex64 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 3%] 2025-08-14T23:40:40.9355460Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_float32 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 3%] 2025-08-14T23:40:40.9356440Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int32 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 3%] 2025-08-14T23:40:40.9357206Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_angle_cuda_int64 SKIPPED [0.0041s] (Skipped! Inplace variant not supported!) [ 3%] 2025-08-14T23:40:40.9357917Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_bool PASSED [0.0045s] [ 3%] 2025-08-14T23:40:40.9358472Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_complex64 PASSED [0.0127s] [ 3%] 2025-08-14T23:40:40.9359037Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_float16 PASSED [0.0047s] [ 4%] 2025-08-14T23:40:40.9359594Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_float64 PASSED [0.0049s] [ 4%] 2025-08-14T23:40:40.9360143Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_int8 PASSED [0.0044s] [ 4%] 2025-08-14T23:40:40.9360814Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asin_cuda_uint8 PASSED [0.0043s] [ 4%] 2025-08-14T23:40:40.9361380Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_complex32 PASSED [0.0148s] [ 4%] 2025-08-14T23:40:40.9361952Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_complex64 PASSED [0.0047s] [ 4%] 2025-08-14T23:40:40.9362518Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_float64 PASSED [0.0049s] [ 4%] 2025-08-14T23:40:40.9363072Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int32 PASSED [0.0044s] [ 4%] 2025-08-14T23:40:40.9363685Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_int8 PASSED [0.0045s] [ 4%] 2025-08-14T23:40:40.9364240Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_asinh_cuda_uint8 PASSED [0.0043s] [ 4%] 2025-08-14T23:40:40.9364795Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_bfloat16 PASSED [0.0104s] [ 4%] 2025-08-14T23:40:40.9365368Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex128 PASSED [0.0061s] [ 4%] 2025-08-14T23:40:40.9365934Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex32 PASSED [0.0062s] [ 4%] 2025-08-14T23:40:40.9366502Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_complex64 PASSED [0.0060s] [ 4%] 2025-08-14T23:40:40.9367062Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_float16 PASSED [0.0049s] [ 4%] 2025-08-14T23:40:40.9367621Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_float64 PASSED [0.0047s] [ 5%] 2025-08-14T23:40:40.9368171Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atan_cuda_int8 PASSED [0.0044s] [ 5%] 2025-08-14T23:40:40.9368722Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_bfloat16 PASSED [0.0110s] [ 5%] 2025-08-14T23:40:40.9369294Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_complex32 PASSED [0.0060s] [ 5%] 2025-08-14T23:40:40.9369861Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_atanh_cuda_int16 PASSED [0.0045s] [ 5%] 2025-08-14T23:40:40.9370409Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float16 PASSED [0.0237s] [ 5%] 2025-08-14T23:40:40.9370972Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float32 PASSED [0.0049s] [ 5%] 2025-08-14T23:40:40.9371524Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_float64 PASSED [0.0047s] [ 5%] 2025-08-14T23:40:40.9372075Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_ceil_cuda_uint8 PASSED [0.0049s] [ 5%] 2025-08-14T23:40:40.9372720Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_bool PASSED [0.0045s] [ 5%] 2025-08-14T23:40:40.9373334Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_complex128 PASSED [0.0104s] [ 5%] 2025-08-14T23:40:40.9374024Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_complex32 PASSED [0.0060s] [ 5%] 2025-08-14T23:40:40.9374642Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_float64 PASSED [0.0047s] [ 5%] 2025-08-14T23:40:40.9375243Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int16 PASSED [0.0045s] [ 5%] 2025-08-14T23:40:40.9375836Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_int64 PASSED [0.0045s] [ 5%] 2025-08-14T23:40:40.9376428Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_conj_physical_cuda_uint8 PASSED [0.0046s] [ 6%] 2025-08-14T23:40:40.9377012Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_bfloat16 PASSED [0.0048s] [ 6%] 2025-08-14T23:40:40.9377637Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_deg2rad_cuda_int64 PASSED [0.0046s] [ 6%] 2025-08-14T23:40:40.9378198Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_bool PASSED [0.0189s] [ 6%] 2025-08-14T23:40:40.9378742Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_float32 PASSED [0.0049s] [ 6%] 2025-08-14T23:40:40.9379290Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int16 PASSED [0.0044s] [ 6%] 2025-08-14T23:40:40.9379826Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erf_cuda_int64 PASSED [0.0045s] [ 6%] 2025-08-14T23:40:40.9380429Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_float16 PASSED [0.0058s] [ 6%] 2025-08-14T23:40:40.9381000Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_erfinv_cuda_uint8 PASSED [0.0045s] [ 6%] 2025-08-14T23:40:40.9381549Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_bool PASSED [0.0473s] [ 6%] 2025-08-14T23:40:40.9382121Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_complex128 PASSED [0.0051s] [ 6%] 2025-08-14T23:40:40.9382693Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_float32 PASSED [0.0048s] [ 6%] 2025-08-14T23:40:40.9383250Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_float64 PASSED [0.0047s] [ 6%] 2025-08-14T23:40:40.9383807Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_expm1_cuda_int64 PASSED [0.0045s] [ 6%] 2025-08-14T23:40:40.9384371Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_bfloat16 PASSED [0.0048s] [ 6%] 2025-08-14T23:40:40.9384936Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_float16 PASSED [0.0049s] [ 7%] 2025-08-14T23:40:40.9385491Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int32 PASSED [0.0046s] [ 7%] 2025-08-14T23:40:40.9386044Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int64 PASSED [0.0047s] [ 7%] 2025-08-14T23:40:40.9386592Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_floor_cuda_int8 PASSED [0.0046s] [ 7%] 2025-08-14T23:40:40.9387141Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_bfloat16 PASSED [0.0052s] [ 7%] 2025-08-14T23:40:40.9387700Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_frac_cuda_float64 PASSED [0.0047s] [ 7%] 2025-08-14T23:40:40.9388376Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_bfloat16 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9389331Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_complex32 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9390116Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_float16 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9390949Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int16 SKIPPED [0.0035s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9391717Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_int8 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9392486Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isinf_cuda_uint8 SKIPPED [0.0035s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9393248Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_bool SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9394031Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_complex128 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9394865Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int16 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9395633Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_int8 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 7%] 2025-08-14T23:40:40.9396396Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isnan_cuda_uint8 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9397176Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int16 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9398023Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_int64 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9398812Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isneginf_cuda_uint8 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9399625Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_bfloat16 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9400461Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_bool SKIPPED [0.0035s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9401259Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_float32 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9402049Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_isposinf_cuda_int32 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 8%] 2025-08-14T23:40:40.9402726Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_bfloat16 PASSED [0.0193s] [ 8%] 2025-08-14T23:40:40.9403284Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_bool PASSED [0.0046s] [ 8%] 2025-08-14T23:40:40.9403840Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_float64 PASSED [0.0047s] [ 8%] 2025-08-14T23:40:40.9404409Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int32 PASSED [0.0045s] [ 8%] 2025-08-14T23:40:40.9404961Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_int64 PASSED [0.0043s] [ 8%] 2025-08-14T23:40:40.9405513Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_log1p_cuda_uint8 PASSED [0.0048s] [ 8%] 2025-08-14T23:40:40.9406067Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_bfloat16 PASSED [0.0298s] [ 8%] 2025-08-14T23:40:40.9406705Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_complex128 PASSED [0.0064s] [ 9%] 2025-08-14T23:40:40.9407271Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_complex32 PASSED [0.0060s] [ 9%] 2025-08-14T23:40:40.9407828Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_float16 PASSED [0.0048s] [ 9%] 2025-08-14T23:40:40.9408438Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_float64 PASSED [0.0050s] [ 9%] 2025-08-14T23:40:40.9408983Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int32 PASSED [0.0046s] [ 9%] 2025-08-14T23:40:40.9409521Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int64 PASSED [0.0048s] [ 9%] 2025-08-14T23:40:40.9410055Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_int8 PASSED [0.0047s] [ 9%] 2025-08-14T23:40:40.9410592Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_neg_cuda_uint8 PASSED [0.0047s] [ 9%] 2025-08-14T23:40:40.9411346Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_nn_functional_relu_cuda_uint8 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 9%] 2025-08-14T23:40:40.9412180Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_bfloat16 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 9%] 2025-08-14T23:40:40.9412994Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_complex64 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 9%] 2025-08-14T23:40:40.9413788Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_int8 SKIPPED [0.0037s] (Skipped! Inplace variant not supported!) [ 9%] 2025-08-14T23:40:40.9414651Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_positive_cuda_uint8 SKIPPED [0.0035s] (Skipped! Inplace variant not supported!) [ 9%] 2025-08-14T23:40:40.9415333Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int32 PASSED [0.0046s] [ 9%] 2025-08-14T23:40:40.9415894Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_rad2deg_cuda_int8 PASSED [0.0044s] [ 9%] 2025-08-14T23:40:40.9416456Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_float32 PASSED [0.0048s] [ 10%] 2025-08-14T23:40:40.9417017Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_float64 PASSED [0.0048s] [ 10%] 2025-08-14T23:40:40.9417570Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int16 PASSED [0.0047s] [ 10%] 2025-08-14T23:40:40.9418126Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_round_cuda_int64 PASSED [0.0047s] [ 10%] 2025-08-14T23:40:40.9418701Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sgn_cuda_bfloat16 PASSED [0.0048s] [ 10%] 2025-08-14T23:40:40.9419382Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_int16 SKIPPED [0.0036s] (Skipped! Inplace variant not supported!) [ 10%] 2025-08-14T23:40:40.9420174Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_signbit_cuda_uint8 SKIPPED [0.0034s] (Skipped! Inplace variant not supported!) [ 10%] 2025-08-14T23:40:40.9420850Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_bool PASSED [0.0113s] [ 10%] 2025-08-14T23:40:40.9421414Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_complex128 PASSED [0.0062s] [ 10%] 2025-08-14T23:40:40.9421986Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_complex64 PASSED [0.0065s] [ 10%] 2025-08-14T23:40:40.9422550Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sin_cuda_uint8 PASSED [0.0044s] [ 10%] 2025-08-14T23:40:40.9423102Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_bool PASSED [0.0107s] [ 10%] 2025-08-14T23:40:40.9423725Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_complex64 PASSED [0.0061s] [ 10%] 2025-08-14T23:40:40.9424298Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_float16 PASSED [0.0048s] [ 10%] 2025-08-14T23:40:40.9424852Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_int16 PASSED [0.0045s] [ 10%] 2025-08-14T23:40:40.9425458Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sinh_cuda_uint8 PASSED [0.0044s] [ 11%] 2025-08-14T23:40:40.9426002Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_bool PASSED [0.0046s] [ 11%] 2025-08-14T23:40:40.9426564Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_complex128 PASSED [0.0061s] [ 11%] 2025-08-14T23:40:40.9427139Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_complex32 PASSED [0.0063s] [ 11%] 2025-08-14T23:40:40.9427705Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_float64 PASSED [0.0049s] [ 11%] 2025-08-14T23:40:40.9428257Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int16 PASSED [0.0045s] [ 11%] 2025-08-14T23:40:40.9428852Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_sqrt_cuda_int8 PASSED [0.0044s] [ 11%] 2025-08-14T23:40:40.9429414Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_complex32 PASSED [0.0063s] [ 11%] 2025-08-14T23:40:40.9429977Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_float32 PASSED [0.0111s] [ 11%] 2025-08-14T23:40:40.9430534Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_float64 PASSED [0.0047s] [ 11%] 2025-08-14T23:40:40.9431084Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int16 PASSED [0.0044s] [ 11%] 2025-08-14T23:40:40.9431677Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int32 PASSED [0.0044s] [ 11%] 2025-08-14T23:40:40.9432219Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tan_cuda_int8 PASSED [0.0045s] [ 11%] 2025-08-14T23:40:40.9432762Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_bool PASSED [0.0102s] [ 11%] 2025-08-14T23:40:40.9433310Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_tanh_cuda_int32 PASSED [0.0045s] [ 11%] 2025-08-14T23:40:40.9433871Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_bfloat16 PASSED [0.0047s] [ 12%] 2025-08-14T23:40:40.9434443Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_inplace_trunc_cuda_float64 PASSED [0.0049s] [ 12%] 2025-08-14T23:40:40.9434995Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_bfloat16 PASSED [0.0095s] [ 12%] 2025-08-14T23:40:40.9435533Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int64 PASSED [0.0094s] [ 12%] 2025-08-14T23:40:40.9436064Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_abs_cuda_int8 PASSED [0.0091s] [ 12%] 2025-08-14T23:40:40.9436594Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_bool PASSED [0.0098s] [ 12%] 2025-08-14T23:40:40.9437147Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_complex32 PASSED [0.0107s] [ 12%] 2025-08-14T23:40:40.9437698Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_float64 PASSED [0.0096s] [ 12%] 2025-08-14T23:40:40.9438237Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_angle_cuda_int16 PASSED [0.0092s] [ 12%] 2025-08-14T23:40:40.9438769Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_bool PASSED [0.0083s] [ 12%] 2025-08-14T23:40:40.9439315Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_complex128 PASSED [0.0094s] [ 12%] 2025-08-14T23:40:40.9439870Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_complex64 PASSED [0.0093s] [ 12%] 2025-08-14T23:40:40.9440533Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asin_cuda_float32 PASSED [0.0093s] [ 12%] 2025-08-14T23:40:40.9441077Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_float32 PASSED [0.0092s] [ 12%] 2025-08-14T23:40:40.9441619Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int16 PASSED [0.0095s] [ 12%] 2025-08-14T23:40:40.9442225Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int32 PASSED [0.0091s] [ 13%] 2025-08-14T23:40:40.9442749Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_asinh_cuda_int8 PASSED [0.0095s] [ 13%] 2025-08-14T23:40:40.9443288Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_bool PASSED [0.0093s] [ 13%] 2025-08-14T23:40:40.9443837Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_complex32 PASSED [0.0094s] [ 13%] 2025-08-14T23:40:40.9444389Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_float32 PASSED [0.0091s] [ 13%] 2025-08-14T23:40:40.9444928Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atan_cuda_int32 PASSED [0.0094s] [ 13%] 2025-08-14T23:40:40.9445533Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_bfloat16 PASSED [0.0091s] [ 13%] 2025-08-14T23:40:40.9446083Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_bool PASSED [0.0085s] [ 13%] 2025-08-14T23:40:40.9446629Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_complex128 PASSED [0.0106s] [ 13%] 2025-08-14T23:40:40.9447193Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_complex32 PASSED [0.0093s] [ 13%] 2025-08-14T23:40:40.9447742Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_atanh_cuda_uint8 PASSED [0.0081s] [ 13%] 2025-08-14T23:40:40.9448335Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_float16 PASSED [0.0091s] [ 13%] 2025-08-14T23:40:40.9448878Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int16 PASSED [0.0091s] [ 13%] 2025-08-14T23:40:40.9449404Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_ceil_cuda_int32 PASSED [0.0090s] [ 13%] 2025-08-14T23:40:40.9449973Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_complex128 PASSED [0.0093s] [ 13%] 2025-08-14T23:40:40.9450583Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_complex64 PASSED [0.0092s] [ 14%] 2025-08-14T23:40:40.9451186Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_float16 PASSED [0.0095s] [ 14%] 2025-08-14T23:40:40.9451769Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int16 PASSED [0.0089s] [ 14%] 2025-08-14T23:40:40.9452342Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int32 PASSED [0.0090s] [ 14%] 2025-08-14T23:40:40.9452913Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_conj_physical_cuda_int8 PASSED [0.0088s] [ 14%] 2025-08-14T23:40:40.9453467Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_bool PASSED [0.0092s] [ 14%] 2025-08-14T23:40:40.9454006Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int16 PASSED [0.0091s] [ 14%] 2025-08-14T23:40:40.9454548Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int32 PASSED [0.0093s] [ 14%] 2025-08-14T23:40:40.9455087Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_deg2rad_cuda_int64 PASSED [0.0092s] [ 14%] 2025-08-14T23:40:40.9455621Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_float16 PASSED [0.0091s] [ 14%] 2025-08-14T23:40:40.9456155Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_float64 PASSED [0.0092s] [ 14%] 2025-08-14T23:40:40.9456759Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erf_cuda_int8 PASSED [0.0091s] [ 14%] 2025-08-14T23:40:40.9457286Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_bool PASSED [0.0085s] [ 14%] 2025-08-14T23:40:40.9457827Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_float16 PASSED [0.0091s] [ 14%] 2025-08-14T23:40:40.9458380Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_float32 PASSED [0.0103s] [ 14%] 2025-08-14T23:40:40.9458980Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_erfinv_cuda_int32 PASSED [0.0104s] [ 15%] 2025-08-14T23:40:40.9459511Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_bool PASSED [0.0094s] [ 15%] 2025-08-14T23:40:40.9460047Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float16 PASSED [0.0093s] [ 15%] 2025-08-14T23:40:40.9460592Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float32 PASSED [0.0093s] [ 15%] 2025-08-14T23:40:40.9461129Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_float64 PASSED [0.0091s] [ 15%] 2025-08-14T23:40:40.9461722Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int32 PASSED [0.0094s] [ 15%] 2025-08-14T23:40:40.9462257Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_expm1_cuda_int8 PASSED [0.0092s] [ 15%] 2025-08-14T23:40:40.9462790Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_float32 PASSED [0.0093s] [ 15%] 2025-08-14T23:40:40.9463329Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_int16 PASSED [0.0090s] [ 15%] 2025-08-14T23:40:40.9463859Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_floor_cuda_uint8 PASSED [0.0090s] [ 15%] 2025-08-14T23:40:40.9464491Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_complex128 SKIPPED [0.0034s] (Skipped! Out not supported) [ 15%] 2025-08-14T23:40:40.9465271Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_complex64 SKIPPED [0.0034s] (Skipped! Out not supported) [ 15%] 2025-08-14T23:40:40.9465985Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_float16 SKIPPED [0.0038s] (Skipped! Out not supported) [ 15%] 2025-08-14T23:40:40.9466691Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_float32 SKIPPED [0.0034s] (Skipped! Out not supported) [ 15%] 2025-08-14T23:40:40.9467390Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isinf_cuda_uint8 SKIPPED [0.0036s] (Skipped! Out not supported) [ 15%] 2025-08-14T23:40:40.9468091Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_bfloat16 SKIPPED [0.0034s] (Skipped! Out not supported) [ 15%] 2025-08-14T23:40:40.9468807Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_complex128 SKIPPED [0.0036s] (Skipped! Out not supported) [ 16%] 2025-08-14T23:40:40.9469532Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_complex64 SKIPPED [0.0034s] (Skipped! Out not supported) [ 16%] 2025-08-14T23:40:40.9470240Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_float16 SKIPPED [0.0036s] (Skipped! Out not supported) [ 16%] 2025-08-14T23:40:40.9470938Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_float32 SKIPPED [0.0034s] (Skipped! Out not supported) [ 16%] 2025-08-14T23:40:40.9471634Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int16 SKIPPED [0.0036s] (Skipped! Out not supported) [ 16%] 2025-08-14T23:40:40.9472327Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_int32 SKIPPED [0.0034s] (Skipped! Out not supported) [ 16%] 2025-08-14T23:40:40.9473017Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isnan_cuda_uint8 SKIPPED [0.0034s] (Skipped! Out not supported) [ 16%] 2025-08-14T23:40:40.9473646Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_bfloat16 PASSED [0.0536s] [ 16%] 2025-08-14T23:40:40.9474271Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_bool PASSED [0.0093s] [ 16%] 2025-08-14T23:40:40.9474819Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isneginf_cuda_int8 PASSED [0.0093s] [ 16%] 2025-08-14T23:40:40.9475377Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_float64 PASSED [0.0090s] [ 16%] 2025-08-14T23:40:40.9475991Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_isposinf_cuda_int32 PASSED [0.0092s] [ 16%] 2025-08-14T23:40:40.9476526Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_bool PASSED [0.0092s] [ 16%] 2025-08-14T23:40:40.9477063Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_float32 PASSED [0.0093s] [ 16%] 2025-08-14T23:40:40.9477602Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_float64 PASSED [0.0092s] [ 16%] 2025-08-14T23:40:40.9478138Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int16 PASSED [0.0093s] [ 17%] 2025-08-14T23:40:40.9478670Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_log1p_cuda_int8 PASSED [0.0092s] [ 17%] 2025-08-14T23:40:40.9479266Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_float16 PASSED [0.0094s] [ 17%] 2025-08-14T23:40:40.9479802Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_neg_cuda_int64 PASSED [0.0090s] [ 17%] 2025-08-14T23:40:40.9480508Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float16 SKIPPED [0.0036s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9481291Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float32 SKIPPED [0.0034s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9482124Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_float64 SKIPPED [0.0039s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9482896Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int16 SKIPPED [0.0034s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9483648Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int32 SKIPPED [0.0034s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9484403Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_int8 SKIPPED [0.0034s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9485156Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_nn_functional_relu_cuda_uint8 SKIPPED [0.0034s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9485905Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_complex128 SKIPPED [0.0036s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9486646Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_complex32 SKIPPED [0.0034s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9487370Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int16 SKIPPED [0.0036s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9488075Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int32 SKIPPED [0.0035s] (Skipped! Out not supported) [ 17%] 2025-08-14T23:40:40.9488779Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_positive_cuda_int64 SKIPPED [0.0036s] (Skipped! Out not supported) [ 18%] 2025-08-14T23:40:40.9489417Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_float16 PASSED [0.0091s] [ 18%] 2025-08-14T23:40:40.9489980Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_int8 PASSED [0.0093s] [ 18%] 2025-08-14T23:40:40.9490519Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_rad2deg_cuda_uint8 PASSED [0.0091s] [ 18%] 2025-08-14T23:40:40.9491138Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_float16 PASSED [0.0093s] [ 18%] 2025-08-14T23:40:40.9491715Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int16 PASSED [0.0089s] [ 18%] 2025-08-14T23:40:40.9492259Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_round_cuda_int8 PASSED [0.0092s] [ 18%] 2025-08-14T23:40:40.9492868Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_complex64 PASSED [0.0108s] [ 18%] 2025-08-14T23:40:40.9493407Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_float64 PASSED [0.0092s] [ 18%] 2025-08-14T23:40:40.9493933Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int16 PASSED [0.0091s] [ 18%] 2025-08-14T23:40:40.9494458Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int32 PASSED [0.0090s] [ 18%] 2025-08-14T23:40:40.9494984Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int64 PASSED [0.0092s] [ 18%] 2025-08-14T23:40:40.9495505Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_int8 PASSED [0.0090s] [ 18%] 2025-08-14T23:40:40.9496095Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sgn_cuda_uint8 PASSED [0.0092s] [ 18%] 2025-08-14T23:40:40.9496633Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_float64 PASSED [0.0092s] [ 18%] 2025-08-14T23:40:40.9497181Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int16 PASSED [0.0092s] [ 19%] 2025-08-14T23:40:40.9497702Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_int32 PASSED [0.0090s] [ 19%] 2025-08-14T23:40:40.9498229Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sign_cuda_uint8 PASSED [0.0094s] [ 19%] 2025-08-14T23:40:40.9498767Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_float16 PASSED [0.0091s] [ 19%] 2025-08-14T23:40:40.9499373Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int32 PASSED [0.0092s] [ 19%] 2025-08-14T23:40:40.9499919Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int64 PASSED [0.0090s] [ 19%] 2025-08-14T23:40:40.9500474Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_int8 PASSED [0.0090s] [ 19%] 2025-08-14T23:40:40.9501011Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_signbit_cuda_uint8 PASSED [0.0090s] [ 19%] 2025-08-14T23:40:40.9501548Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_bfloat16 PASSED [0.0092s] [ 19%] 2025-08-14T23:40:40.9502078Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_float32 PASSED [0.0093s] [ 19%] 2025-08-14T23:40:40.9502598Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int32 PASSED [0.0092s] [ 19%] 2025-08-14T23:40:40.9503114Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sin_cuda_int64 PASSED [0.0093s] [ 19%] 2025-08-14T23:40:40.9503639Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_bfloat16 PASSED [0.0092s] [ 19%] 2025-08-14T23:40:40.9504189Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_complex128 PASSED [0.0109s] [ 19%] 2025-08-14T23:40:40.9504734Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_float16 PASSED [0.0092s] [ 19%] 2025-08-14T23:40:40.9505264Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sinh_cuda_int16 PASSED [0.0094s] [ 20%] 2025-08-14T23:40:40.9505799Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_complex64 PASSED [0.0106s] [ 20%] 2025-08-14T23:40:40.9506340Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_float16 PASSED [0.0094s] [ 20%] 2025-08-14T23:40:40.9506867Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int32 PASSED [0.0092s] [ 20%] 2025-08-14T23:40:40.9507389Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_int8 PASSED [0.0094s] [ 20%] 2025-08-14T23:40:40.9507974Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_sqrt_cuda_uint8 PASSED [0.0091s] [ 20%] 2025-08-14T23:40:40.9508507Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_bfloat16 PASSED [0.0094s] [ 20%] 2025-08-14T23:40:40.9509035Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_bool PASSED [0.0091s] [ 20%] 2025-08-14T23:40:40.9509624Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_complex128 PASSED [0.0106s] [ 20%] 2025-08-14T23:40:40.9510167Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_complex32 PASSED [0.0094s] [ 20%] 2025-08-14T23:40:40.9510693Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tan_cuda_int64 PASSED [0.0094s] [ 20%] 2025-08-14T23:40:40.9511236Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_complex128 PASSED [0.0107s] [ 20%] 2025-08-14T23:40:40.9511781Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_float32 PASSED [0.0092s] [ 20%] 2025-08-14T23:40:40.9512312Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int32 PASSED [0.0097s] [ 20%] 2025-08-14T23:40:40.9512890Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_tanh_cuda_int8 PASSED [0.0093s] [ 20%] 2025-08-14T23:40:40.9513432Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_bfloat16 PASSED [0.0095s] [ 21%] 2025-08-14T23:40:40.9513973Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_float32 PASSED [0.0093s] [ 21%] 2025-08-14T23:40:40.9514506Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_csr_unary_out_trunc_cuda_float64 PASSED [0.0094s] [ 21%] 2025-08-14T23:40:40.9515010Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_float32 PASSED [0.5245s] [ 21%] 2025-08-14T23:40:40.9515531Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_cuda_float64 PASSED [0.0154s] [ 21%] 2025-08-14T23:40:40.9516073Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_bfloat16 SKIPPED [0.0009s] (Only runs on cpu) [ 21%] 2025-08-14T23:40:40.9516675Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_mm_reduce_cuda_float64 SKIPPED [0.0008s] (Only runs on cpu) [ 21%] 2025-08-14T23:40:40.9517278Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_to_sparse_compressed_SparseCSR_cuda_float64 PASSED [0.1564s] [ 21%] 2025-08-14T23:40:40.9517980Z test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_triangular_solve_cuda_complex64 SKIPPED [0.0011s] (cuSparse Generic API SpSV is not available) [ 21%] 2025-08-14T23:40:40.9518835Z test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_bfloat16 PASSED [0.0459s] [ 21%] 2025-08-14T23:40:40.9519399Z test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int16 PASSED [0.0284s] [ 21%] 2025-08-14T23:40:40.9519908Z test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_int64 PASSED [0.0282s] [ 21%] 2025-08-14T23:40:40.9520500Z test_sparse_csr.py::TestSparseCSRCUDA::test_sum_cuda_uint8 PASSED [0.0283s] [ 21%] 2025-08-14T23:40:40.9521063Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_bool PASSED [1.2247s] [ 21%] 2025-08-14T23:40:40.9521733Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int16 PASSED [1.2242s] [ 21%] 2025-08-14T23:40:40.9522380Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSC_cuda_int64 PASSED [1.1240s] [ 22%] 2025-08-14T23:40:40.9522984Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_complex64 PASSED [1.2935s] [ 22%] 2025-08-14T23:40:40.9541642Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseBSR_cuda_int16 PASSED [1.1988s] [ 22%] 2025-08-14T23:40:40.9542237Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float16 PASSED [0.7304s] [ 22%] 2025-08-14T23:40:40.9542792Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float32 PASSED [0.7296s] [ 22%] 2025-08-14T23:40:40.9543466Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_float64 PASSED [0.7296s] [ 22%] 2025-08-14T23:40:40.9543990Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSC_cuda_uint8 PASSED [0.6932s] [ 22%] 2025-08-14T23:40:40.9544509Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_bool PASSED [0.7538s] [ 22%] 2025-08-14T23:40:40.9545040Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_complex64 PASSED [0.8126s] [ 22%] 2025-08-14T23:40:40.9545654Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float16 PASSED [0.8005s] [ 22%] 2025-08-14T23:40:40.9546172Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float32 PASSED [0.8018s] [ 22%] 2025-08-14T23:40:40.9546691Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_float64 PASSED [0.7956s] [ 22%] 2025-08-14T23:40:40.9547215Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_int32 PASSED [0.7624s] [ 22%] 2025-08-14T23:40:40.9547716Z test_sparse_csr.py::TestSparseCSRCUDA::test_transpose_SparseCSR_cuda_uint8 PASSED [0.7626s] [ 22%] 2025-08-14T23:40:40.9548295Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_complex32 PASSED [0.0035s] [ 22%] 2025-08-14T23:40:40.9549007Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_complex64 PASSED [0.0033s] [ 23%] 2025-08-14T23:40:40.9549631Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_int64 PASSED [0.0030s] [ 23%] 2025-08-14T23:40:40.9550235Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_abs_cuda_uint8 PASSED [0.0031s] [ 23%] 2025-08-14T23:40:40.9550838Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_bool PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9551456Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_float32 PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9552135Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int32 PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9552736Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_int8 PASSED [0.0034s] [ 23%] 2025-08-14T23:40:40.9553327Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_angle_cuda_uint8 PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9553921Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asin_cuda_float16 PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9554527Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_bool PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9555151Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_complex128 PASSED [0.0033s] [ 23%] 2025-08-14T23:40:40.9555794Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_complex32 PASSED [0.0034s] [ 23%] 2025-08-14T23:40:40.9556425Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_asinh_cuda_uint8 PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9557038Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_bfloat16 PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9557659Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex128 PASSED [0.0032s] [ 23%] 2025-08-14T23:40:40.9558295Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex32 PASSED [0.0034s] [ 23%] 2025-08-14T23:40:40.9558918Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atan_cuda_complex64 PASSED [0.0032s] [ 24%] 2025-08-14T23:40:40.9559543Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_complex64 PASSED [0.0049s] [ 24%] 2025-08-14T23:40:40.9560166Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_float16 PASSED [0.0032s] [ 24%] 2025-08-14T23:40:40.9560913Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_int8 PASSED [0.0032s] [ 24%] 2025-08-14T23:40:40.9561518Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_atanh_cuda_uint8 PASSED [0.0032s] [ 24%] 2025-08-14T23:40:40.9562126Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_bfloat16 PASSED [0.0032s] [ 24%] 2025-08-14T23:40:40.9562810Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_float64 PASSED [0.0032s] [ 24%] 2025-08-14T23:40:40.9563407Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int64 PASSED [0.0029s] [ 24%] 2025-08-14T23:40:40.9564002Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_ceil_cuda_int8 PASSED [0.0030s] [ 24%] 2025-08-14T23:40:40.9564644Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex128 PASSED [0.0033s] [ 24%] 2025-08-14T23:40:40.9565320Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_float64 PASSED [0.0030s] [ 24%] 2025-08-14T23:40:40.9566044Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int16 PASSED [0.0027s] [ 24%] 2025-08-14T23:40:40.9566697Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_int8 PASSED [0.0027s] [ 24%] 2025-08-14T23:40:40.9567346Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_conj_physical_cuda_uint8 PASSED [0.0027s] [ 24%] 2025-08-14T23:40:40.9567978Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_bool PASSED [0.0032s] [ 24%] 2025-08-14T23:40:40.9568598Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_float16 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9569284Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_float32 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9569906Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int64 PASSED [0.0033s] [ 25%] 2025-08-14T23:40:40.9570521Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_int8 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9571143Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_deg2rad_cuda_uint8 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9571749Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_bool PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9572350Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_float16 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9572957Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_float64 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9573555Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int32 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9574144Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_int8 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9574734Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erf_cuda_uint8 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9575341Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_bool PASSED [0.0033s] [ 25%] 2025-08-14T23:40:40.9575965Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int32 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9576579Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_erfinv_cuda_int64 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9577202Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_bfloat16 PASSED [0.0032s] [ 25%] 2025-08-14T23:40:40.9577905Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_float32 PASSED [0.0032s] [ 26%] 2025-08-14T23:40:40.9578522Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_float64 PASSED [0.0032s] [ 26%] 2025-08-14T23:40:40.9579136Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_int8 PASSED [0.0032s] [ 26%] 2025-08-14T23:40:40.9579785Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_expm1_cuda_uint8 PASSED [0.0032s] [ 26%] 2025-08-14T23:40:40.9580399Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_bfloat16 PASSED [0.0032s] [ 26%] 2025-08-14T23:40:40.9581027Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_float64 PASSED [0.0033s] [ 26%] 2025-08-14T23:40:40.9581633Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_int64 PASSED [0.0029s] [ 26%] 2025-08-14T23:40:40.9582234Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_floor_cuda_uint8 PASSED [0.0029s] [ 26%] 2025-08-14T23:40:40.9582901Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_complex64 PASSED [0.0031s] [ 26%] 2025-08-14T23:40:40.9583528Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_float64 PASSED [0.0030s] [ 26%] 2025-08-14T23:40:40.9584129Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int16 PASSED [0.0030s] [ 26%] 2025-08-14T23:40:40.9584729Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_int32 PASSED [0.0030s] [ 26%] 2025-08-14T23:40:40.9585330Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isinf_cuda_uint8 PASSED [0.0029s] [ 26%] 2025-08-14T23:40:40.9585984Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_bool PASSED [0.0029s] [ 26%] 2025-08-14T23:40:40.9586595Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int32 PASSED [0.0031s] [ 26%] 2025-08-14T23:40:40.9587208Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isnan_cuda_int64 PASSED [0.0029s] [ 27%] 2025-08-14T23:40:40.9587825Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isneginf_cuda_bool PASSED [0.0029s] [ 27%] 2025-08-14T23:40:40.9588453Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_bfloat16 PASSED [0.0030s] [ 27%] 2025-08-14T23:40:40.9589081Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_bool PASSED [0.0029s] [ 27%] 2025-08-14T23:40:40.9589700Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_isposinf_cuda_int16 PASSED [0.0030s] [ 27%] 2025-08-14T23:40:40.9590326Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_bfloat16 PASSED [0.0032s] [ 27%] 2025-08-14T23:40:40.9590942Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_int8 PASSED [0.0032s] [ 27%] 2025-08-14T23:40:40.9591550Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_log1p_cuda_uint8 PASSED [0.0032s] [ 27%] 2025-08-14T23:40:40.9592159Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_complex64 PASSED [0.0046s] [ 27%] 2025-08-14T23:40:40.9592771Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_float32 PASSED [0.0032s] [ 27%] 2025-08-14T23:40:40.9593368Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int64 PASSED [0.0029s] [ 27%] 2025-08-14T23:40:40.9593962Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_neg_cuda_int8 PASSED [0.0029s] [ 27%] 2025-08-14T23:40:40.9594611Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_bfloat16 PASSED [0.0032s] [ 27%] 2025-08-14T23:40:40.9595372Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int16 PASSED [0.0029s] [ 27%] 2025-08-14T23:40:40.9596037Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int8 PASSED [0.0029s] [ 27%] 2025-08-14T23:40:40.9596736Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_float16 PASSED [0.0030s] [ 28%] 2025-08-14T23:40:40.9597361Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_float32 PASSED [0.0030s] [ 28%] 2025-08-14T23:40:40.9597979Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int16 PASSED [0.0029s] [ 28%] 2025-08-14T23:40:40.9598587Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int32 PASSED [0.0027s] [ 28%] 2025-08-14T23:40:40.9599200Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_int8 PASSED [0.0027s] [ 28%] 2025-08-14T23:40:40.9599805Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_positive_cuda_uint8 PASSED [0.0027s] [ 28%] 2025-08-14T23:40:40.9600564Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int32 PASSED [0.0032s] [ 28%] 2025-08-14T23:40:40.9601172Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_rad2deg_cuda_int64 PASSED [0.0032s] [ 28%] 2025-08-14T23:40:40.9601762Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int16 PASSED [0.0029s] [ 28%] 2025-08-14T23:40:40.9602351Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_round_cuda_int64 PASSED [0.0029s] [ 28%] 2025-08-14T23:40:40.9603016Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_complex128 PASSED [0.0045s] [ 28%] 2025-08-14T23:40:40.9603625Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_float16 PASSED [0.0033s] [ 28%] 2025-08-14T23:40:40.9604218Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int16 PASSED [0.0030s] [ 28%] 2025-08-14T23:40:40.9604801Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int32 PASSED [0.0030s] [ 28%] 2025-08-14T23:40:40.9605371Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sgn_cuda_int64 PASSED [0.0029s] [ 28%] 2025-08-14T23:40:40.9605954Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_bfloat16 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9606553Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_float16 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9607147Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sign_cuda_float32 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9607744Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_bool PASSED [0.0029s] [ 29%] 2025-08-14T23:40:40.9608352Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_float16 PASSED [0.0030s] [ 29%] 2025-08-14T23:40:40.9608957Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int16 PASSED [0.0031s] [ 29%] 2025-08-14T23:40:40.9609557Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_int32 PASSED [0.0029s] [ 29%] 2025-08-14T23:40:40.9610150Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_signbit_cuda_uint8 PASSED [0.0030s] [ 29%] 2025-08-14T23:40:40.9610745Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_bfloat16 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9611341Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_bool PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9612003Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int64 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9612575Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sin_cuda_int8 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9613166Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_complex64 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9613835Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_float32 PASSED [0.0032s] [ 29%] 2025-08-14T23:40:40.9614420Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_int8 PASSED [0.0033s] [ 29%] 2025-08-14T23:40:40.9615010Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sinh_cuda_uint8 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9615599Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_bool PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9616200Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_complex32 PASSED [0.0033s] [ 30%] 2025-08-14T23:40:40.9616867Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_complex64 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9617478Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_sqrt_cuda_float32 PASSED [0.0031s] [ 30%] 2025-08-14T23:40:40.9618079Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_bfloat16 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9618676Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int16 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9619265Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_int32 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9619901Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tan_cuda_uint8 PASSED [0.0033s] [ 30%] 2025-08-14T23:40:40.9620511Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_bfloat16 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9621126Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_complex64 PASSED [0.0046s] [ 30%] 2025-08-14T23:40:40.9621735Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_float64 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9622335Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_tanh_cuda_uint8 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9622936Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_float16 PASSED [0.0032s] [ 30%] 2025-08-14T23:40:40.9623548Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_float32 PASSED [0.0031s] [ 30%] 2025-08-14T23:40:40.9624146Z test_sparse_csr.py::TestSparseCSRCUDA::test_zero_to_zero_correspondence_unary_trunc_cuda_int64 PASSED [0.0029s] [ 30%] 2025-08-14T23:40:40.9624729Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_float64 PASSED [0.0422s] [ 31%] 2025-08-14T23:40:40.9625274Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int16 PASSED [0.0335s] [ 31%] 2025-08-14T23:40:40.9625801Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSC_cuda_int8 PASSED [0.0332s] [ 31%] 2025-08-14T23:40:40.9626344Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_bfloat16 PASSED [0.0416s] [ 31%] 2025-08-14T23:40:40.9626899Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_complex128 PASSED [0.0440s] [ 31%] 2025-08-14T23:40:40.9627449Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_float64 PASSED [0.0416s] [ 31%] 2025-08-14T23:40:40.9627981Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_int64 PASSED [0.0334s] [ 31%] 2025-08-14T23:40:40.9628576Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseBSR_cuda_uint8 PASSED [0.0333s] [ 31%] 2025-08-14T23:40:40.9629104Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_float32 PASSED [0.0397s] [ 31%] 2025-08-14T23:40:40.9629634Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSC_cuda_uint8 PASSED [0.0316s] [ 31%] 2025-08-14T23:40:40.9630220Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_float16 PASSED [0.0399s] [ 31%] 2025-08-14T23:40:40.9630758Z test_sparse_csr.py::TestSparseCompressedCUDA::test_clone_SparseCSR_cuda_int8 PASSED [0.0317s] [ 31%] 2025-08-14T23:40:40.9631329Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_bfloat16 PASSED [0.0045s] [ 31%] 2025-08-14T23:40:40.9631938Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_float64 PASSED [0.0046s] [ 31%] 2025-08-14T23:40:40.9632538Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_abs_cuda_int64 PASSED [0.0044s] [ 31%] 2025-08-14T23:40:40.9633147Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_complex32 PASSED [0.0048s] [ 32%] 2025-08-14T23:40:40.9633824Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_complex64 PASSED [0.0058s] [ 32%] 2025-08-14T23:40:40.9634443Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_float32 PASSED [0.0046s] [ 32%] 2025-08-14T23:40:40.9635051Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_angle_cuda_int32 PASSED [0.0045s] [ 32%] 2025-08-14T23:40:40.9635655Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_complex128 PASSED [0.0047s] [ 32%] 2025-08-14T23:40:40.9636265Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_float16 PASSED [0.0045s] [ 32%] 2025-08-14T23:40:40.9636915Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_float64 PASSED [0.0046s] [ 32%] 2025-08-14T23:40:40.9637514Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int32 PASSED [0.0045s] [ 32%] 2025-08-14T23:40:40.9638105Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_int64 PASSED [0.0046s] [ 32%] 2025-08-14T23:40:40.9638698Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asin_cuda_uint8 PASSED [0.0044s] [ 32%] 2025-08-14T23:40:40.9639305Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_complex128 PASSED [0.0047s] [ 32%] 2025-08-14T23:40:40.9639928Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_complex32 PASSED [0.0045s] [ 32%] 2025-08-14T23:40:40.9640584Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_float16 PASSED [0.0046s] [ 32%] 2025-08-14T23:40:40.9641194Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_float32 PASSED [0.0045s] [ 32%] 2025-08-14T23:40:40.9641796Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_asinh_cuda_int64 PASSED [0.0046s] [ 32%] 2025-08-14T23:40:40.9642390Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_bool PASSED [0.0044s] [ 33%] 2025-08-14T23:40:40.9642990Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_complex64 PASSED [0.0047s] [ 33%] 2025-08-14T23:40:40.9643596Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_float32 PASSED [0.0045s] [ 33%] 2025-08-14T23:40:40.9644197Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_float64 PASSED [0.0046s] [ 33%] 2025-08-14T23:40:40.9644787Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atan_cuda_int8 PASSED [0.0045s] [ 33%] 2025-08-14T23:40:40.9645391Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_bfloat16 PASSED [0.0046s] [ 33%] 2025-08-14T23:40:40.9646067Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int16 PASSED [0.0045s] [ 33%] 2025-08-14T23:40:40.9646665Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_int8 PASSED [0.0046s] [ 33%] 2025-08-14T23:40:40.9647258Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_atanh_cuda_uint8 PASSED [0.0045s] [ 33%] 2025-08-14T23:40:40.9647919Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_bfloat16 PASSED [0.0047s] [ 33%] 2025-08-14T23:40:40.9648519Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_float16 PASSED [0.0045s] [ 33%] 2025-08-14T23:40:40.9649114Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_ceil_cuda_int32 PASSED [0.0045s] [ 33%] 2025-08-14T23:40:40.9649744Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_bfloat16 PASSED [0.0045s] [ 33%] 2025-08-14T23:40:40.9650418Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_complex128 PASSED [0.0046s] [ 33%] 2025-08-14T23:40:40.9651161Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_float64 PASSED [0.0044s] [ 33%] 2025-08-14T23:40:40.9651813Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int64 PASSED [0.0045s] [ 34%] 2025-08-14T23:40:40.9652454Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_conj_physical_cuda_int8 PASSED [0.0043s] [ 34%] 2025-08-14T23:40:40.9653083Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_bfloat16 PASSED [0.0046s] [ 34%] 2025-08-14T23:40:40.9653702Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_float16 PASSED [0.0045s] [ 34%] 2025-08-14T23:40:40.9654380Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int32 PASSED [0.0046s] [ 34%] 2025-08-14T23:40:40.9654988Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int64 PASSED [0.0045s] [ 34%] 2025-08-14T23:40:40.9655599Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_int8 PASSED [0.0046s] [ 34%] 2025-08-14T23:40:40.9656201Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_deg2rad_cuda_uint8 PASSED [0.0044s] [ 34%] 2025-08-14T23:40:40.9656802Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_bool PASSED [0.0046s] [ 34%] 2025-08-14T23:40:40.9657393Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_int32 PASSED [0.0044s] [ 34%] 2025-08-14T23:40:40.9657984Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erf_cuda_uint8 PASSED [0.0047s] [ 34%] 2025-08-14T23:40:40.9658590Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_float32 PASSED [0.0045s] [ 34%] 2025-08-14T23:40:40.9659211Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_float64 PASSED [0.0055s] [ 34%] 2025-08-14T23:40:40.9659828Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int16 PASSED [0.0044s] [ 34%] 2025-08-14T23:40:40.9660429Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_erfinv_cuda_int32 PASSED [0.0046s] [ 34%] 2025-08-14T23:40:40.9661044Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_bfloat16 PASSED [0.0044s] [ 35%] 2025-08-14T23:40:40.9661643Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_bool PASSED [0.0046s] [ 35%] 2025-08-14T23:40:40.9662257Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_complex128 PASSED [0.0045s] [ 35%] 2025-08-14T23:40:40.9662875Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int16 PASSED [0.0047s] [ 35%] 2025-08-14T23:40:40.9663546Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_expm1_cuda_int64 PASSED [0.0045s] [ 35%] 2025-08-14T23:40:40.9664154Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_float16 PASSED [0.0046s] [ 35%] 2025-08-14T23:40:40.9664763Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_floor_cuda_float32 PASSED [0.0044s] [ 35%] 2025-08-14T23:40:40.9665439Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_bfloat16 PASSED [0.0046s] [ 35%] 2025-08-14T23:40:40.9666047Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_frac_cuda_float64 PASSED [0.0044s] [ 35%] 2025-08-14T23:40:40.9666654Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_bfloat16 PASSED [0.0046s] [ 35%] 2025-08-14T23:40:40.9667274Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_complex128 PASSED [0.0044s] [ 35%] 2025-08-14T23:40:40.9667900Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_complex32 PASSED [0.0046s] [ 35%] 2025-08-14T23:40:40.9668513Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_float32 PASSED [0.0043s] [ 35%] 2025-08-14T23:40:40.9669168Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isinf_cuda_int16 PASSED [0.0045s] [ 35%] 2025-08-14T23:40:40.9669776Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_bool PASSED [0.0043s] [ 35%] 2025-08-14T23:40:40.9670382Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_complex64 PASSED [0.0045s] [ 36%] 2025-08-14T23:40:40.9670991Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_float32 PASSED [0.0043s] [ 36%] 2025-08-14T23:40:40.9671593Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_float64 PASSED [0.0045s] [ 36%] 2025-08-14T23:40:40.9672248Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int64 PASSED [0.0043s] [ 36%] 2025-08-14T23:40:40.9672848Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_int8 PASSED [0.0045s] [ 36%] 2025-08-14T23:40:40.9673443Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isnan_cuda_uint8 PASSED [0.0043s] [ 36%] 2025-08-14T23:40:40.9674042Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_bool PASSED [0.0045s] [ 36%] 2025-08-14T23:40:40.9674652Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isneginf_cuda_int64 PASSED [0.0043s] [ 36%] 2025-08-14T23:40:40.9675261Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int32 PASSED [0.0045s] [ 36%] 2025-08-14T23:40:40.9675871Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_isposinf_cuda_int64 PASSED [0.0043s] [ 36%] 2025-08-14T23:40:40.9676480Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_bfloat16 PASSED [0.0046s] [ 36%] 2025-08-14T23:40:40.9677092Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_complex128 PASSED [0.0045s] [ 36%] 2025-08-14T23:40:40.9677711Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_log1p_cuda_complex64 PASSED [0.0047s] [ 36%] 2025-08-14T23:40:40.9678514Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int16 SKIPPED [0.0017s] (masked.amax does not support input with torch.sparse_bsc layout) [ 36%] 2025-08-14T23:40:40.9679484Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int32 SKIPPED [0.0017s] (masked.amax does not support input with torch.sparse_bsc layout) [ 36%] 2025-08-14T23:40:40.9680481Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amax_cuda_int8 SKIPPED [0.0017s] (masked.amax does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9681533Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_float32 SKIPPED [0.0019s] (masked.amin does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9682502Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_amin_cuda_int8 SKIPPED [0.0017s] (masked.amin does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9683535Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_complex64 SKIPPED [0.0017s] (masked.mean does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9684517Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_mean_cuda_float32 SKIPPED [0.0017s] (masked.mean does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9685506Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_complex128 SKIPPED [0.0018s] (masked.prod does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9686563Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_complex64 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9687546Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_float16 SKIPPED [0.0018s] (masked.prod does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9688520Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_float32 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9689480Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int16 SKIPPED [0.0018s] (masked.prod does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9690504Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_prod_cuda_int8 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9691486Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_complex64 SKIPPED [0.0017s] (masked.sum does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9692454Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_float16 SKIPPED [0.0017s] (masked.sum does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9693415Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_float64 SKIPPED [0.0018s] (masked.sum does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9694370Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int64 SKIPPED [0.0016s] (masked.sum does not support input with torch.sparse_bsc layout) [ 37%] 2025-08-14T23:40:40.9695319Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_int8 SKIPPED [0.0017s] (masked.sum does not support input with torch.sparse_bsc layout) [ 38%] 2025-08-14T23:40:40.9696268Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_masked_sum_cuda_uint8 SKIPPED [0.0017s] (masked.sum does not support input with torch.sparse_bsc layout) [ 38%] 2025-08-14T23:40:40.9697054Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_complex32 PASSED [0.0119s] [ 38%] 2025-08-14T23:40:40.9697658Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_int64 PASSED [0.0082s] [ 38%] 2025-08-14T23:40:40.9698249Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_mul_cuda_uint8 PASSED [0.0080s] [ 38%] 2025-08-14T23:40:40.9698893Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_nn_functional_relu_cuda_float64 PASSED [0.0058s] [ 38%] 2025-08-14T23:40:40.9699621Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_bfloat16 PASSED [0.0044s] [ 38%] 2025-08-14T23:40:40.9700270Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_complex128 PASSED [0.0047s] [ 38%] 2025-08-14T23:40:40.9700917Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_float32 PASSED [0.0044s] [ 38%] 2025-08-14T23:40:40.9701597Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_positive_cuda_int32 PASSED [0.0045s] [ 38%] 2025-08-14T23:40:40.9702219Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_bfloat16 PASSED [0.0044s] [ 38%] 2025-08-14T23:40:40.9702828Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int16 PASSED [0.0046s] [ 38%] 2025-08-14T23:40:40.9703438Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_rad2deg_cuda_int8 PASSED [0.0044s] [ 38%] 2025-08-14T23:40:40.9704074Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_complex128 PASSED [0.0105s] [ 38%] 2025-08-14T23:40:40.9704726Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_complex32 PASSED [0.0079s] [ 38%] 2025-08-14T23:40:40.9705427Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_randn_like_cuda_float32 PASSED [0.0084s] [ 38%] 2025-08-14T23:40:40.9706043Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_round_cuda_int8 PASSED [0.0043s] [ 39%] 2025-08-14T23:40:40.9706637Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_bool PASSED [0.0045s] [ 39%] 2025-08-14T23:40:40.9707226Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_float32 PASSED [0.0044s] [ 39%] 2025-08-14T23:40:40.9707820Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int16 PASSED [0.0045s] [ 39%] 2025-08-14T23:40:40.9708459Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sgn_cuda_int8 PASSED [0.0043s] [ 39%] 2025-08-14T23:40:40.9709052Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_float32 PASSED [0.0046s] [ 39%] 2025-08-14T23:40:40.9709649Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int16 PASSED [0.0043s] [ 39%] 2025-08-14T23:40:40.9710238Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int64 PASSED [0.0044s] [ 39%] 2025-08-14T23:40:40.9710831Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sign_cuda_int8 PASSED [0.0043s] [ 39%] 2025-08-14T23:40:40.9711438Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_float32 PASSED [0.0044s] [ 39%] 2025-08-14T23:40:40.9712052Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int64 PASSED [0.0043s] [ 39%] 2025-08-14T23:40:40.9712657Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_signbit_cuda_int8 PASSED [0.0044s] [ 39%] 2025-08-14T23:40:40.9713249Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_bool PASSED [0.0044s] [ 39%] 2025-08-14T23:40:40.9713844Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_complex32 PASSED [0.0060s] [ 39%] 2025-08-14T23:40:40.9714447Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sin_cuda_float32 PASSED [0.0044s] [ 39%] 2025-08-14T23:40:40.9715040Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_bool PASSED [0.0046s] [ 40%] 2025-08-14T23:40:40.9715644Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_complex32 PASSED [0.0058s] [ 40%] 2025-08-14T23:40:40.9716249Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_float16 PASSED [0.0046s] [ 40%] 2025-08-14T23:40:40.9716844Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sinh_cuda_uint8 PASSED [0.0044s] [ 40%] 2025-08-14T23:40:40.9717504Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_complex32 PASSED [0.0046s] [ 40%] 2025-08-14T23:40:40.9718112Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_float32 PASSED [0.0045s] [ 40%] 2025-08-14T23:40:40.9718703Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int32 PASSED [0.0046s] [ 40%] 2025-08-14T23:40:40.9719344Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_int8 PASSED [0.0044s] [ 40%] 2025-08-14T23:40:40.9719932Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sqrt_cuda_uint8 PASSED [0.0047s] [ 40%] 2025-08-14T23:40:40.9720557Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_bool PASSED [0.0117s] [ 40%] 2025-08-14T23:40:40.9721154Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_complex128 PASSED [0.0139s] [ 40%] 2025-08-14T23:40:40.9721768Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_complex32 PASSED [0.0163s] [ 40%] 2025-08-14T23:40:40.9722444Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_float64 PASSED [0.0129s] [ 40%] 2025-08-14T23:40:40.9723042Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_sum_cuda_int64 PASSED [0.0110s] [ 40%] 2025-08-14T23:40:40.9723634Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_bfloat16 PASSED [0.0044s] [ 40%] 2025-08-14T23:40:40.9724226Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_bool PASSED [0.0046s] [ 41%] 2025-08-14T23:40:40.9724823Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_complex128 PASSED [0.0045s] [ 41%] 2025-08-14T23:40:40.9725502Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_float64 PASSED [0.0046s] [ 41%] 2025-08-14T23:40:40.9726095Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tan_cuda_int16 PASSED [0.0045s] [ 41%] 2025-08-14T23:40:40.9726687Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_float32 PASSED [0.0046s] [ 41%] 2025-08-14T23:40:40.9727285Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_float64 PASSED [0.0044s] [ 41%] 2025-08-14T23:40:40.9727875Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_int32 PASSED [0.0046s] [ 41%] 2025-08-14T23:40:40.9728458Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_tanh_cuda_uint8 PASSED [0.0045s] [ 41%] 2025-08-14T23:40:40.9729231Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_float16 SKIPPED [0.0017s] (to_sparse does not support input with torch.sparse_bsc layout) [ 41%] 2025-08-14T23:40:40.9730178Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_float32 SKIPPED [0.0019s] (to_sparse does not support input with torch.sparse_bsc layout) [ 41%] 2025-08-14T23:40:40.9731117Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_to_sparse_cuda_int32 SKIPPED [0.0017s] (to_sparse does not support input with torch.sparse_bsc layout) [ 41%] 2025-08-14T23:40:40.9731890Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_float32 PASSED [0.0045s] [ 41%] 2025-08-14T23:40:40.9732506Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_float64 PASSED [0.0046s] [ 41%] 2025-08-14T23:40:40.9733106Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_int8 PASSED [0.0043s] [ 41%] 2025-08-14T23:40:40.9733695Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_trunc_cuda_uint8 PASSED [0.0045s] [ 41%] 2025-08-14T23:40:40.9734313Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_float16 PASSED [0.0066s] [ 42%] 2025-08-14T23:40:40.9735039Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_int64 PASSED [0.0068s] [ 42%] 2025-08-14T23:40:40.9735661Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSC_zeros_like_cuda_uint8 PASSED [0.0063s] [ 42%] 2025-08-14T23:40:40.9736260Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_bool PASSED [0.0044s] [ 42%] 2025-08-14T23:40:40.9736920Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_complex128 PASSED [0.0045s] [ 42%] 2025-08-14T23:40:40.9737525Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_complex32 PASSED [0.0047s] [ 42%] 2025-08-14T23:40:40.9738125Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_float32 PASSED [0.0044s] [ 42%] 2025-08-14T23:40:40.9738721Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_float64 PASSED [0.0046s] [ 42%] 2025-08-14T23:40:40.9739314Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int32 PASSED [0.0043s] [ 42%] 2025-08-14T23:40:40.9739947Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_abs_cuda_int64 PASSED [0.0045s] [ 42%] 2025-08-14T23:40:40.9740556Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_complex32 PASSED [0.0045s] [ 42%] 2025-08-14T23:40:40.9741162Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_float32 PASSED [0.0046s] [ 42%] 2025-08-14T23:40:40.9741764Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_float64 PASSED [0.0044s] [ 42%] 2025-08-14T23:40:40.9742360Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int32 PASSED [0.0046s] [ 42%] 2025-08-14T23:40:40.9743005Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_int8 PASSED [0.0044s] [ 42%] 2025-08-14T23:40:40.9743602Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_angle_cuda_uint8 PASSED [0.0046s] [ 43%] 2025-08-14T23:40:40.9744209Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_complex32 PASSED [0.0045s] [ 43%] 2025-08-14T23:40:40.9744809Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asin_cuda_int16 PASSED [0.0046s] [ 43%] 2025-08-14T23:40:40.9745398Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_bool PASSED [0.0044s] [ 43%] 2025-08-14T23:40:40.9745982Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int32 PASSED [0.0046s] [ 43%] 2025-08-14T23:40:40.9746573Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_int8 PASSED [0.0044s] [ 43%] 2025-08-14T23:40:40.9747161Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_asinh_cuda_uint8 PASSED [0.0046s] [ 43%] 2025-08-14T23:40:40.9747755Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_bool PASSED [0.0044s] [ 43%] 2025-08-14T23:40:40.9748359Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_complex32 PASSED [0.0046s] [ 43%] 2025-08-14T23:40:40.9748961Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int16 PASSED [0.0045s] [ 43%] 2025-08-14T23:40:40.9749551Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int64 PASSED [0.0049s] [ 43%] 2025-08-14T23:40:40.9750136Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_int8 PASSED [0.0044s] [ 43%] 2025-08-14T23:40:40.9750720Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atan_cuda_uint8 PASSED [0.0046s] [ 43%] 2025-08-14T23:40:40.9751307Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_bool PASSED [0.0044s] [ 43%] 2025-08-14T23:40:40.9751975Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_complex32 PASSED [0.0048s] [ 43%] 2025-08-14T23:40:40.9752586Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float16 PASSED [0.0045s] [ 44%] 2025-08-14T23:40:40.9753192Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float32 PASSED [0.0046s] [ 44%] 2025-08-14T23:40:40.9753850Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_atanh_cuda_float64 PASSED [0.0045s] [ 44%] 2025-08-14T23:40:40.9754449Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_float32 PASSED [0.0046s] [ 44%] 2025-08-14T23:40:40.9755039Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_int16 PASSED [0.0043s] [ 44%] 2025-08-14T23:40:40.9755627Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_ceil_cuda_uint8 PASSED [0.0045s] [ 44%] 2025-08-14T23:40:40.9756263Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_complex32 PASSED [0.0045s] [ 44%] 2025-08-14T23:40:40.9756916Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int32 PASSED [0.0044s] [ 44%] 2025-08-14T23:40:40.9757612Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_conj_physical_cuda_int64 PASSED [0.0043s] [ 44%] 2025-08-14T23:40:40.9758236Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_bool PASSED [0.0047s] [ 44%] 2025-08-14T23:40:40.9758840Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_float16 PASSED [0.0045s] [ 44%] 2025-08-14T23:40:40.9759461Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int16 PASSED [0.0046s] [ 44%] 2025-08-14T23:40:40.9760065Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_int32 PASSED [0.0045s] [ 44%] 2025-08-14T23:40:40.9760785Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_deg2rad_cuda_uint8 PASSED [0.0046s] [ 44%] 2025-08-14T23:40:40.9761374Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_bool PASSED [0.0044s] [ 44%] 2025-08-14T23:40:40.9761953Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int16 PASSED [0.0046s] [ 45%] 2025-08-14T23:40:40.9762539Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erf_cuda_int32 PASSED [0.0044s] [ 45%] 2025-08-14T23:40:40.9763120Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int16 PASSED [0.0046s] [ 45%] 2025-08-14T23:40:40.9763718Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int32 PASSED [0.0044s] [ 45%] 2025-08-14T23:40:40.9764311Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_int8 PASSED [0.0047s] [ 45%] 2025-08-14T23:40:40.9764906Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_erfinv_cuda_uint8 PASSED [0.0044s] [ 45%] 2025-08-14T23:40:40.9765504Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_float32 PASSED [0.0046s] [ 45%] 2025-08-14T23:40:40.9766100Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int16 PASSED [0.0044s] [ 45%] 2025-08-14T23:40:40.9766688Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int32 PASSED [0.0047s] [ 45%] 2025-08-14T23:40:40.9767273Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_expm1_cuda_int64 PASSED [0.0044s] [ 45%] 2025-08-14T23:40:40.9767871Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_bfloat16 PASSED [0.0046s] [ 45%] 2025-08-14T23:40:40.9768472Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_floor_cuda_float32 PASSED [0.0044s] [ 45%] 2025-08-14T23:40:40.9769064Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_float32 PASSED [0.0046s] [ 45%] 2025-08-14T23:40:40.9769736Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_frac_cuda_float64 PASSED [0.0045s] [ 45%] 2025-08-14T23:40:40.9770324Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_bool PASSED [0.0045s] [ 45%] 2025-08-14T23:40:40.9770923Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_complex32 PASSED [0.0044s] [ 46%] 2025-08-14T23:40:40.9771597Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_float32 PASSED [0.0045s] [ 46%] 2025-08-14T23:40:40.9772191Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int32 PASSED [0.0043s] [ 46%] 2025-08-14T23:40:40.9772772Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isinf_cuda_int8 PASSED [0.0045s] [ 46%] 2025-08-14T23:40:40.9773376Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_complex128 PASSED [0.0043s] [ 46%] 2025-08-14T23:40:40.9773996Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_complex64 PASSED [0.0044s] [ 46%] 2025-08-14T23:40:40.9774672Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isnan_cuda_float64 PASSED [0.0043s] [ 46%] 2025-08-14T23:40:40.9775287Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_float32 PASSED [0.0044s] [ 46%] 2025-08-14T23:40:40.9775906Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_float64 PASSED [0.0043s] [ 46%] 2025-08-14T23:40:40.9776517Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isneginf_cuda_int64 PASSED [0.0045s] [ 46%] 2025-08-14T23:40:40.9777133Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_float16 PASSED [0.0044s] [ 46%] 2025-08-14T23:40:40.9777806Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_isposinf_cuda_float32 PASSED [0.0045s] [ 46%] 2025-08-14T23:40:40.9778420Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_float64 PASSED [0.0045s] [ 46%] 2025-08-14T23:40:40.9779012Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_log1p_cuda_uint8 PASSED [0.0047s] [ 46%] 2025-08-14T23:40:40.9779799Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_float16 SKIPPED [0.0017s] (masked.amax does not support input with torch.sparse_bsr layout) [ 46%] 2025-08-14T23:40:40.9780774Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_float64 SKIPPED [0.0017s] (masked.amax does not support input with torch.sparse_bsr layout) [ 46%] 2025-08-14T23:40:40.9781734Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amax_cuda_int8 SKIPPED [0.0018s] (masked.amax does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9782694Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_float64 SKIPPED [0.0017s] (masked.amin does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9783656Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_amin_cuda_int32 SKIPPED [0.0017s] (masked.amin does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9784621Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_bfloat16 SKIPPED [0.0017s] (masked.mean does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9785613Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_complex128 SKIPPED [0.0019s] (masked.mean does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9786605Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_complex64 SKIPPED [0.0017s] (masked.mean does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9787644Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_mean_cuda_float16 SKIPPED [0.0017s] (masked.mean does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9788620Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_bfloat16 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9789642Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_bool SKIPPED [0.0018s] (masked.prod does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9790607Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_float32 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9791574Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_float64 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9792589Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int16 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9793550Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_int8 SKIPPED [0.0018s] (masked.prod does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9794502Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_prod_cuda_uint8 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9795450Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_masked_sum_cuda_int16 SKIPPED [0.0017s] (masked.sum does not support input with torch.sparse_bsr layout) [ 47%] 2025-08-14T23:40:40.9796300Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_complex128 PASSED [0.0093s] [ 48%] 2025-08-14T23:40:40.9796916Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_float16 PASSED [0.0088s] [ 48%] 2025-08-14T23:40:40.9797508Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_float32 PASSED [0.0089s] [ 48%] 2025-08-14T23:40:40.9798099Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int16 PASSED [0.0081s] [ 48%] 2025-08-14T23:40:40.9798679Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_int64 PASSED [0.0079s] [ 48%] 2025-08-14T23:40:40.9799260Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_mul_cuda_uint8 PASSED [0.0081s] [ 48%] 2025-08-14T23:40:40.9799842Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_bfloat16 PASSED [0.0044s] [ 48%] 2025-08-14T23:40:40.9800490Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_float16 PASSED [0.0046s] [ 48%] 2025-08-14T23:40:40.9801080Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_float64 PASSED [0.0045s] [ 48%] 2025-08-14T23:40:40.9801659Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int32 PASSED [0.0044s] [ 48%] 2025-08-14T23:40:40.9802234Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_neg_cuda_int8 PASSED [0.0043s] [ 48%] 2025-08-14T23:40:40.9802864Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_nn_functional_relu_cuda_float64 PASSED [0.0057s] [ 48%] 2025-08-14T23:40:40.9803526Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_complex128 PASSED [0.0045s] [ 48%] 2025-08-14T23:40:40.9804164Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_float32 PASSED [0.0048s] [ 48%] 2025-08-14T23:40:40.9804782Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int64 PASSED [0.0044s] [ 48%] 2025-08-14T23:40:40.9805468Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_int8 PASSED [0.0044s] [ 49%] 2025-08-14T23:40:40.9806075Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_positive_cuda_uint8 PASSED [0.0043s] [ 49%] 2025-08-14T23:40:40.9806769Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_float16 PASSED [0.0046s] [ 49%] 2025-08-14T23:40:40.9807379Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_float32 PASSED [0.0044s] [ 49%] 2025-08-14T23:40:40.9807981Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int32 PASSED [0.0047s] [ 49%] 2025-08-14T23:40:40.9808577Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_rad2deg_cuda_int64 PASSED [0.0045s] [ 49%] 2025-08-14T23:40:40.9809195Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_bfloat16 PASSED [0.0079s] [ 49%] 2025-08-14T23:40:40.9809841Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_complex128 PASSED [0.0078s] [ 49%] 2025-08-14T23:40:40.9810552Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_randn_like_cuda_float64 PASSED [0.0079s] [ 49%] 2025-08-14T23:40:40.9811176Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_float32 PASSED [0.0044s] [ 49%] 2025-08-14T23:40:40.9811775Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_float64 PASSED [0.0046s] [ 49%] 2025-08-14T23:40:40.9812365Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_round_cuda_int8 PASSED [0.0043s] [ 49%] 2025-08-14T23:40:40.9812963Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_complex128 PASSED [0.0047s] [ 49%] 2025-08-14T23:40:40.9813627Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_float64 PASSED [0.0045s] [ 49%] 2025-08-14T23:40:40.9814221Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int16 PASSED [0.0045s] [ 49%] 2025-08-14T23:40:40.9814803Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int32 PASSED [0.0043s] [ 50%] 2025-08-14T23:40:40.9815378Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sgn_cuda_int64 PASSED [0.0045s] [ 50%] 2025-08-14T23:40:40.9815969Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_float32 PASSED [0.0044s] [ 50%] 2025-08-14T23:40:40.9816565Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_float64 PASSED [0.0046s] [ 50%] 2025-08-14T23:40:40.9817153Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int16 PASSED [0.0043s] [ 50%] 2025-08-14T23:40:40.9817735Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int32 PASSED [0.0045s] [ 50%] 2025-08-14T23:40:40.9818317Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_int8 PASSED [0.0043s] [ 50%] 2025-08-14T23:40:40.9818896Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sign_cuda_uint8 PASSED [0.0046s] [ 50%] 2025-08-14T23:40:40.9819486Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_bool PASSED [0.0044s] [ 50%] 2025-08-14T23:40:40.9820097Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_float16 PASSED [0.2099s] [ 50%] 2025-08-14T23:40:40.9820703Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_signbit_cuda_float32 PASSED [0.0046s] [ 50%] 2025-08-14T23:40:40.9821302Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_bool PASSED [0.0045s] [ 50%] 2025-08-14T23:40:40.9821891Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_complex32 PASSED [0.0047s] [ 50%] 2025-08-14T23:40:40.9822545Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sin_cuda_int8 PASSED [0.0045s] [ 50%] 2025-08-14T23:40:40.9823130Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_bfloat16 PASSED [0.0047s] [ 50%] 2025-08-14T23:40:40.9823728Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_complex32 PASSED [0.0045s] [ 51%] 2025-08-14T23:40:40.9824384Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_float16 PASSED [0.0046s] [ 51%] 2025-08-14T23:40:40.9824972Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int16 PASSED [0.0045s] [ 51%] 2025-08-14T23:40:40.9825553Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_int64 PASSED [0.0047s] [ 51%] 2025-08-14T23:40:40.9826140Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sinh_cuda_uint8 PASSED [0.0045s] [ 51%] 2025-08-14T23:40:40.9826731Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_bfloat16 PASSED [0.0047s] [ 51%] 2025-08-14T23:40:40.9827384Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_complex64 PASSED [0.0045s] [ 51%] 2025-08-14T23:40:40.9827983Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int16 PASSED [0.0046s] [ 51%] 2025-08-14T23:40:40.9828567Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int64 PASSED [0.0045s] [ 51%] 2025-08-14T23:40:40.9829148Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_int8 PASSED [0.0048s] [ 51%] 2025-08-14T23:40:40.9829726Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sqrt_cuda_uint8 PASSED [0.0045s] [ 51%] 2025-08-14T23:40:40.9830357Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_bool PASSED [0.0114s] [ 51%] 2025-08-14T23:40:40.9830944Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_float32 PASSED [0.0125s] [ 51%] 2025-08-14T23:40:40.9831529Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int16 PASSED [0.0114s] [ 51%] 2025-08-14T23:40:40.9832108Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_sum_cuda_int8 PASSED [0.0113s] [ 51%] 2025-08-14T23:40:40.9832685Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_bool PASSED [0.0048s] [ 52%] 2025-08-14T23:40:40.9833271Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_complex64 PASSED [0.0062s] [ 52%] 2025-08-14T23:40:40.9833862Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_float64 PASSED [0.0047s] [ 52%] 2025-08-14T23:40:40.9834117Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tan_cuda_uint8 PASSED [0.0045s] [ 52%] 2025-08-14T23:40:40.9834385Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_bfloat16 PASSED [0.0047s] [ 52%] 2025-08-14T23:40:40.9834640Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_bool PASSED [0.0045s] [ 52%] 2025-08-14T23:40:40.9834915Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex128 PASSED [0.0047s] [ 52%] 2025-08-14T23:40:40.9835184Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex32 PASSED [0.0058s] [ 52%] 2025-08-14T23:40:40.9835452Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_complex64 PASSED [0.0047s] [ 52%] 2025-08-14T23:40:40.9835711Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_float64 PASSED [0.0045s] [ 52%] 2025-08-14T23:40:40.9835969Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int32 PASSED [0.0046s] [ 52%] 2025-08-14T23:40:40.9836223Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_tanh_cuda_int64 PASSED [0.0045s] [ 52%] 2025-08-14T23:40:40.9836739Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_bfloat16 SKIPPED [0.0017s] (to_sparse does not support input with torch.sparse_bsr layout) [ 52%] 2025-08-14T23:40:40.9837184Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_complex128 SKIPPED [0.0019s] (to_sparse does not support input with torch.sparse_bsr layout) [ 52%] 2025-08-14T23:40:40.9837676Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_float32 SKIPPED [0.0017s] (to_sparse does not support input with torch.sparse_bsr layout) [ 52%] 2025-08-14T23:40:40.9838103Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int32 SKIPPED [0.0017s] (to_sparse does not support input with torch.sparse_bsr layout) [ 53%] 2025-08-14T23:40:40.9838532Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_to_sparse_cuda_int64 SKIPPED [0.0017s] (to_sparse does not support input with torch.sparse_bsr layout) [ 53%] 2025-08-14T23:40:40.9838803Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_bfloat16 PASSED [0.0046s] [ 53%] 2025-08-14T23:40:40.9839117Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_float32 PASSED [0.0046s] [ 53%] 2025-08-14T23:40:40.9839383Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_float64 PASSED [0.0046s] [ 53%] 2025-08-14T23:40:40.9839644Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int16 PASSED [0.0043s] [ 53%] 2025-08-14T23:40:40.9839901Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_int8 PASSED [0.0045s] [ 53%] 2025-08-14T23:40:40.9840155Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_trunc_cuda_uint8 PASSED [0.0043s] [ 53%] 2025-08-14T23:40:40.9840563Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_complex64 PASSED [0.0068s] [ 53%] 2025-08-14T23:40:40.9840848Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_float32 PASSED [0.0066s] [ 53%] 2025-08-14T23:40:40.9841124Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseBSR_zeros_like_cuda_int8 PASSED [0.0066s] [ 53%] 2025-08-14T23:40:40.9841375Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_bool PASSED [0.0050s] [ 53%] 2025-08-14T23:40:40.9841647Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_complex128 PASSED [0.0055s] [ 53%] 2025-08-14T23:40:40.9841901Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_float16 PASSED [0.0050s] [ 53%] 2025-08-14T23:40:40.9842161Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_abs_cuda_uint8 PASSED [0.0051s] [ 53%] 2025-08-14T23:40:40.9842426Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_float64 PASSED [0.0052s] [ 53%] 2025-08-14T23:40:40.9842684Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int16 PASSED [0.0052s] [ 54%] 2025-08-14T23:40:40.9842942Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int64 PASSED [0.0050s] [ 54%] 2025-08-14T23:40:40.9843203Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_int8 PASSED [0.0052s] [ 54%] 2025-08-14T23:40:40.9843461Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_angle_cuda_uint8 PASSED [0.0050s] [ 54%] 2025-08-14T23:40:40.9843736Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_complex128 PASSED [0.0052s] [ 54%] 2025-08-14T23:40:40.9843993Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_float32 PASSED [0.0050s] [ 54%] 2025-08-14T23:40:40.9844255Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_float64 PASSED [0.0052s] [ 54%] 2025-08-14T23:40:40.9844576Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int16 PASSED [0.0050s] [ 54%] 2025-08-14T23:40:40.9844837Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_int32 PASSED [0.0052s] [ 54%] 2025-08-14T23:40:40.9845155Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asin_cuda_uint8 PASSED [0.0048s] [ 54%] 2025-08-14T23:40:40.9845433Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_bfloat16 PASSED [0.0052s] [ 54%] 2025-08-14T23:40:40.9845687Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_bool PASSED [0.0050s] [ 54%] 2025-08-14T23:40:40.9845954Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_float32 PASSED [0.0053s] [ 54%] 2025-08-14T23:40:40.9846210Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_asinh_cuda_int16 PASSED [0.0049s] [ 54%] 2025-08-14T23:40:40.9846475Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_bfloat16 PASSED [0.0052s] [ 54%] 2025-08-14T23:40:40.9846826Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_complex128 PASSED [0.0050s] [ 55%] 2025-08-14T23:40:40.9847092Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_float64 PASSED [0.0052s] [ 55%] 2025-08-14T23:40:40.9847346Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atan_cuda_int8 PASSED [0.0049s] [ 55%] 2025-08-14T23:40:40.9847602Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_bool PASSED [0.0051s] [ 55%] 2025-08-14T23:40:40.9847867Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_complex32 PASSED [0.0050s] [ 55%] 2025-08-14T23:40:40.9848189Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_complex64 PASSED [0.0053s] [ 55%] 2025-08-14T23:40:40.9848454Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_float64 PASSED [0.0049s] [ 55%] 2025-08-14T23:40:40.9848716Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_atanh_cuda_int16 PASSED [0.0051s] [ 55%] 2025-08-14T23:40:40.9848973Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_float32 PASSED [0.0049s] [ 55%] 2025-08-14T23:40:40.9849237Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int16 PASSED [0.0050s] [ 55%] 2025-08-14T23:40:40.9849490Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int32 PASSED [0.0049s] [ 55%] 2025-08-14T23:40:40.9849745Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_ceil_cuda_int8 PASSED [0.0050s] [ 55%] 2025-08-14T23:40:40.9850028Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_bool PASSED [0.0047s] [ 55%] 2025-08-14T23:40:40.9850325Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_float16 PASSED [0.0051s] [ 55%] 2025-08-14T23:40:40.9850621Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_float32 PASSED [0.0049s] [ 55%] 2025-08-14T23:40:40.9850905Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_conj_physical_cuda_int32 PASSED [0.0049s] [ 56%] 2025-08-14T23:40:40.9851178Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_bfloat16 PASSED [0.0049s] [ 56%] 2025-08-14T23:40:40.9851449Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_float16 PASSED [0.0051s] [ 56%] 2025-08-14T23:40:40.9851714Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_float64 PASSED [0.0050s] [ 56%] 2025-08-14T23:40:40.9851980Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_int64 PASSED [0.0052s] [ 56%] 2025-08-14T23:40:40.9852297Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_deg2rad_cuda_uint8 PASSED [0.0049s] [ 56%] 2025-08-14T23:40:40.9852554Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erf_cuda_bool PASSED [0.0052s] [ 56%] 2025-08-14T23:40:40.9852819Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_float32 PASSED [0.0051s] [ 56%] 2025-08-14T23:40:40.9853137Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_erfinv_cuda_int32 PASSED [0.0052s] [ 56%] 2025-08-14T23:40:40.9853402Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_bfloat16 PASSED [0.0049s] [ 56%] 2025-08-14T23:40:40.9853682Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_complex128 PASSED [0.0052s] [ 56%] 2025-08-14T23:40:40.9853946Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_float32 PASSED [0.0049s] [ 56%] 2025-08-14T23:40:40.9854212Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_float64 PASSED [0.0052s] [ 56%] 2025-08-14T23:40:40.9854518Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int32 PASSED [0.0049s] [ 56%] 2025-08-14T23:40:40.9854779Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_int8 PASSED [0.0052s] [ 56%] 2025-08-14T23:40:40.9855037Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_expm1_cuda_uint8 PASSED [0.0049s] [ 57%] 2025-08-14T23:40:40.9855302Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_float16 PASSED [0.0051s] [ 57%] 2025-08-14T23:40:40.9855559Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_float32 PASSED [0.0049s] [ 57%] 2025-08-14T23:40:40.9855870Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int32 PASSED [0.0050s] [ 57%] 2025-08-14T23:40:40.9856130Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_floor_cuda_int64 PASSED [0.0048s] [ 57%] 2025-08-14T23:40:40.9856398Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_bfloat16 PASSED [0.0052s] [ 57%] 2025-08-14T23:40:40.9856657Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_float16 PASSED [0.0049s] [ 57%] 2025-08-14T23:40:40.9856923Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_frac_cuda_float64 PASSED [0.0051s] [ 57%] 2025-08-14T23:40:40.9857191Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_complex32 PASSED [0.0049s] [ 57%] 2025-08-14T23:40:40.9857455Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_float16 PASSED [0.0052s] [ 57%] 2025-08-14T23:40:40.9857714Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isinf_cuda_float64 PASSED [0.0048s] [ 57%] 2025-08-14T23:40:40.9857984Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_bfloat16 PASSED [0.0051s] [ 57%] 2025-08-14T23:40:40.9858243Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_bool PASSED [0.0048s] [ 57%] 2025-08-14T23:40:40.9858499Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int16 PASSED [0.0050s] [ 57%] 2025-08-14T23:40:40.9858759Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_int32 PASSED [0.0048s] [ 57%] 2025-08-14T23:40:40.9859014Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isnan_cuda_uint8 PASSED [0.0050s] [ 58%] 2025-08-14T23:40:40.9859294Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_bfloat16 PASSED [0.0049s] [ 58%] 2025-08-14T23:40:40.9859556Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_bool PASSED [0.0051s] [ 58%] 2025-08-14T23:40:40.9859889Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_float16 PASSED [0.0051s] [ 58%] 2025-08-14T23:40:40.9860153Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int16 PASSED [0.0050s] [ 58%] 2025-08-14T23:40:40.9860416Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isneginf_cuda_int8 PASSED [0.0048s] [ 58%] 2025-08-14T23:40:40.9860746Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_float32 PASSED [0.0051s] [ 58%] 2025-08-14T23:40:40.9861012Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int16 PASSED [0.0048s] [ 58%] 2025-08-14T23:40:40.9861277Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int64 PASSED [0.0051s] [ 58%] 2025-08-14T23:40:40.9861542Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_int8 PASSED [0.0049s] [ 58%] 2025-08-14T23:40:40.9861807Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_isposinf_cuda_uint8 PASSED [0.0052s] [ 58%] 2025-08-14T23:40:40.9862123Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_bfloat16 PASSED [0.0050s] [ 58%] 2025-08-14T23:40:40.9862388Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_float16 PASSED [0.0051s] [ 58%] 2025-08-14T23:40:40.9862653Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_log1p_cuda_float64 PASSED [0.0051s] [ 58%] 2025-08-14T23:40:40.9863101Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amax_cuda_int32 SKIPPED [0.0019s] (masked.amax does not support input with torch.sparse_csc layout) [ 58%] 2025-08-14T23:40:40.9863605Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_bfloat16 SKIPPED [0.0017s] (masked.amin does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9864056Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_float32 SKIPPED [0.0017s] (masked.amin does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9864505Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_float64 SKIPPED [0.0017s] (masked.amin does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9864946Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_int16 SKIPPED [0.0018s] (masked.amin does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9865387Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_amin_cuda_uint8 SKIPPED [0.0017s] (masked.amin does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9865834Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_mean_cuda_float32 SKIPPED [0.0017s] (masked.mean does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9866278Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_bool SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9866718Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int16 SKIPPED [0.0018s] (masked.prod does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9867160Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_prod_cuda_int64 SKIPPED [0.0017s] (masked.prod does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9867597Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int16 SKIPPED [0.0017s] (masked.sum does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9868028Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_masked_sum_cuda_int32 SKIPPED [0.0017s] (masked.sum does not support input with torch.sparse_csc layout) [ 59%] 2025-08-14T23:40:40.9868353Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_complex128 PASSED [0.0146s] [ 59%] 2025-08-14T23:40:40.9868619Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_complex64 PASSED [0.0105s] [ 59%] 2025-08-14T23:40:40.9868951Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_float64 PASSED [0.0101s] [ 59%] 2025-08-14T23:40:40.9869202Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_mul_cuda_int64 PASSED [0.0101s] [ 59%] 2025-08-14T23:40:40.9869465Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_complex32 PASSED [0.0050s] [ 60%] 2025-08-14T23:40:40.9869723Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_complex64 PASSED [0.0053s] [ 60%] 2025-08-14T23:40:40.9869984Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_float32 PASSED [0.0049s] [ 60%] 2025-08-14T23:40:40.9870236Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int64 PASSED [0.0050s] [ 60%] 2025-08-14T23:40:40.9870541Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_neg_cuda_int8 PASSED [0.0048s] [ 60%] 2025-08-14T23:40:40.9870851Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_bfloat16 PASSED [0.0068s] [ 60%] 2025-08-14T23:40:40.9871156Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_float16 PASSED [0.0066s] [ 60%] 2025-08-14T23:40:40.9871455Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int16 PASSED [0.0067s] [ 60%] 2025-08-14T23:40:40.9871806Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int32 PASSED [0.0063s] [ 60%] 2025-08-14T23:40:40.9872102Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_int64 PASSED [0.0065s] [ 60%] 2025-08-14T23:40:40.9872400Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_nn_functional_relu_cuda_uint8 PASSED [0.0063s] [ 60%] 2025-08-14T23:40:40.9872677Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_bfloat16 PASSED [0.0051s] [ 60%] 2025-08-14T23:40:40.9872998Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_complex64 PASSED [0.0049s] [ 60%] 2025-08-14T23:40:40.9873322Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_float16 PASSED [0.0050s] [ 60%] 2025-08-14T23:40:40.9873666Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_float32 PASSED [0.0049s] [ 60%] 2025-08-14T23:40:40.9874016Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int16 PASSED [0.0051s] [ 61%] 2025-08-14T23:40:40.9874314Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_positive_cuda_int32 PASSED [0.0047s] [ 61%] 2025-08-14T23:40:40.9874646Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_bfloat16 PASSED [0.0051s] [ 61%] 2025-08-14T23:40:40.9874958Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_bool PASSED [0.0050s] [ 61%] 2025-08-14T23:40:40.9875297Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_rad2deg_cuda_int16 PASSED [0.0051s] [ 61%] 2025-08-14T23:40:40.9875666Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_bfloat16 PASSED [0.0086s] [ 61%] 2025-08-14T23:40:40.9875984Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_complex32 PASSED [0.0088s] [ 61%] 2025-08-14T23:40:40.9876339Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_randn_like_cuda_float16 PASSED [0.0085s] [ 61%] 2025-08-14T23:40:40.9876695Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_float64 PASSED [0.0051s] [ 61%] 2025-08-14T23:40:40.9877026Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int32 PASSED [0.0048s] [ 61%] 2025-08-14T23:40:40.9877328Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int64 PASSED [0.0049s] [ 61%] 2025-08-14T23:40:40.9877703Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_round_cuda_int8 PASSED [0.0048s] [ 61%] 2025-08-14T23:40:40.9878007Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_bfloat16 PASSED [0.0051s] [ 61%] 2025-08-14T23:40:40.9878317Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_bool PASSED [0.0048s] [ 61%] 2025-08-14T23:40:40.9878591Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_float32 PASSED [0.0052s] [ 61%] 2025-08-14T23:40:40.9878947Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sgn_cuda_float64 PASSED [0.0049s] [ 61%] 2025-08-14T23:40:40.9879293Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_bfloat16 PASSED [0.0054s] [ 62%] 2025-08-14T23:40:40.9879626Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_bool PASSED [0.0048s] [ 62%] 2025-08-14T23:40:40.9879921Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_float32 PASSED [0.0051s] [ 62%] 2025-08-14T23:40:40.9880223Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sign_cuda_uint8 PASSED [0.0048s] [ 62%] 2025-08-14T23:40:40.9880594Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_bfloat16 PASSED [0.0051s] [ 62%] 2025-08-14T23:40:40.9881020Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_float16 PASSED [0.0049s] [ 62%] 2025-08-14T23:40:40.9881321Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int32 PASSED [0.0050s] [ 62%] 2025-08-14T23:40:40.9881645Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_signbit_cuda_int8 PASSED [0.0048s] [ 62%] 2025-08-14T23:40:40.9881946Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_complex128 PASSED [0.0052s] [ 62%] 2025-08-14T23:40:40.9882269Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int16 PASSED [0.0049s] [ 62%] 2025-08-14T23:40:40.9882577Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sin_cuda_int64 PASSED [0.0051s] [ 62%] 2025-08-14T23:40:40.9882899Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_bfloat16 PASSED [0.0049s] [ 62%] 2025-08-14T23:40:40.9883194Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_complex64 PASSED [0.0051s] [ 62%] 2025-08-14T23:40:40.9883521Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_float16 PASSED [0.0051s] [ 62%] 2025-08-14T23:40:40.9883797Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int32 PASSED [0.0051s] [ 62%] 2025-08-14T23:40:40.9884167Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int64 PASSED [0.0049s] [ 63%] 2025-08-14T23:40:40.9884451Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sinh_cuda_int8 PASSED [0.0051s] [ 63%] 2025-08-14T23:40:40.9884776Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_bfloat16 PASSED [0.0049s] [ 63%] 2025-08-14T23:40:40.9885061Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_float16 PASSED [0.0051s] [ 63%] 2025-08-14T23:40:40.9885366Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int32 PASSED [0.0049s] [ 63%] 2025-08-14T23:40:40.9885673Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sqrt_cuda_int64 PASSED [0.0051s] [ 63%] 2025-08-14T23:40:40.9886106Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_bfloat16 PASSED [0.0133s] [ 63%] 2025-08-14T23:40:40.9886406Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_complex128 PASSED [0.0131s] [ 63%] 2025-08-14T23:40:40.9886797Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_complex32 PASSED [0.0134s] [ 63%] 2025-08-14T23:40:40.9887078Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int16 PASSED [0.0130s] [ 63%] 2025-08-14T23:40:40.9887419Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int32 PASSED [0.0130s] [ 63%] 2025-08-14T23:40:40.9887717Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_int64 PASSED [0.0130s] [ 63%] 2025-08-14T23:40:40.9888031Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_sum_cuda_uint8 PASSED [0.0128s] [ 63%] 2025-08-14T23:40:40.9888324Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_float16 PASSED [0.0051s] [ 63%] 2025-08-14T23:40:40.9888723Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_float64 PASSED [0.0049s] [ 63%] 2025-08-14T23:40:40.9889049Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int16 PASSED [0.0051s] [ 64%] 2025-08-14T23:40:40.9889344Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tan_cuda_int8 PASSED [0.0049s] [ 64%] 2025-08-14T23:40:40.9889667Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_float32 PASSED [0.0051s] [ 64%] 2025-08-14T23:40:40.9889952Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_tanh_cuda_int64 PASSED [0.0049s] [ 64%] 2025-08-14T23:40:40.9890356Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_bfloat16 PASSED [0.0231s] [ 64%] 2025-08-14T23:40:40.9890653Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_to_sparse_cuda_float16 PASSED [0.0058s] [ 64%] 2025-08-14T23:40:40.9891020Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_bfloat16 PASSED [0.0051s] [ 64%] 2025-08-14T23:40:40.9891316Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_float32 PASSED [0.0049s] [ 64%] 2025-08-14T23:40:40.9891650Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_int32 PASSED [0.0050s] [ 64%] 2025-08-14T23:40:40.9891938Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_trunc_cuda_uint8 PASSED [0.0048s] [ 64%] 2025-08-14T23:40:40.9892254Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_bool PASSED [0.0072s] [ 64%] 2025-08-14T23:40:40.9892593Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_complex128 PASSED [0.0073s] [ 64%] 2025-08-14T23:40:40.9892954Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_complex32 PASSED [0.0075s] [ 64%] 2025-08-14T23:40:40.9893255Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_int64 PASSED [0.0070s] [ 64%] 2025-08-14T23:40:40.9893598Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSC_zeros_like_cuda_uint8 PASSED [0.0072s] [ 64%] 2025-08-14T23:40:40.9893881Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_bool PASSED [0.0048s] [ 65%] 2025-08-14T23:40:40.9894211Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_complex64 PASSED [0.0054s] [ 65%] 2025-08-14T23:40:40.9894511Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_float32 PASSED [0.0049s] [ 65%] 2025-08-14T23:40:40.9894849Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_float64 PASSED [0.0051s] [ 65%] 2025-08-14T23:40:40.9895192Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_abs_cuda_uint8 PASSED [0.0048s] [ 65%] 2025-08-14T23:40:40.9895512Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_angle_cuda_int8 PASSED [0.0051s] [ 65%] 2025-08-14T23:40:40.9895783Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_bool PASSED [0.0048s] [ 65%] 2025-08-14T23:40:40.9896205Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_complex128 PASSED [0.0051s] [ 65%] 2025-08-14T23:40:40.9896520Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_complex32 PASSED [0.0049s] [ 65%] 2025-08-14T23:40:40.9896845Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int16 PASSED [0.0051s] [ 65%] 2025-08-14T23:40:40.9897132Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asin_cuda_int32 PASSED [0.0049s] [ 65%] 2025-08-14T23:40:40.9897448Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_bfloat16 PASSED [0.0051s] [ 65%] 2025-08-14T23:40:40.9897821Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_complex32 PASSED [0.0049s] [ 65%] 2025-08-14T23:40:40.9898183Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_float64 PASSED [0.0051s] [ 65%] 2025-08-14T23:40:40.9898472Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_asinh_cuda_uint8 PASSED [0.0049s] [ 65%] 2025-08-14T23:40:40.9898789Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_bool PASSED [0.0051s] [ 66%] 2025-08-14T23:40:40.9899080Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_complex32 PASSED [0.0049s] [ 66%] 2025-08-14T23:40:40.9899474Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int16 PASSED [0.0051s] [ 66%] 2025-08-14T23:40:40.9899781Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atan_cuda_int64 PASSED [0.0049s] [ 66%] 2025-08-14T23:40:40.9900110Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_bfloat16 PASSED [0.0051s] [ 66%] 2025-08-14T23:40:40.9900414Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_complex128 PASSED [0.0049s] [ 66%] 2025-08-14T23:40:40.9900751Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_complex32 PASSED [0.0052s] [ 66%] 2025-08-14T23:40:40.9901100Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_float16 PASSED [0.0049s] [ 66%] 2025-08-14T23:40:40.9901408Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_float32 PASSED [0.0051s] [ 66%] 2025-08-14T23:40:40.9901727Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_atanh_cuda_int32 PASSED [0.0049s] [ 66%] 2025-08-14T23:40:40.9902019Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_bfloat16 PASSED [0.0051s] [ 66%] 2025-08-14T23:40:40.9902337Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_float64 PASSED [0.0049s] [ 66%] 2025-08-14T23:40:40.9902621Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int64 PASSED [0.0050s] [ 66%] 2025-08-14T23:40:40.9902974Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_ceil_cuda_int8 PASSED [0.0048s] [ 66%] 2025-08-14T23:40:40.9903308Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_complex128 PASSED [0.0051s] [ 66%] 2025-08-14T23:40:40.9903655Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_float64 PASSED [0.0048s] [ 67%] 2025-08-14T23:40:40.9903967Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int32 PASSED [0.0049s] [ 67%] 2025-08-14T23:40:40.9904371Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_conj_physical_cuda_int8 PASSED [0.0047s] [ 67%] 2025-08-14T23:40:40.9904689Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_bool PASSED [0.0051s] [ 67%] 2025-08-14T23:40:40.9905026Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_int64 PASSED [0.0049s] [ 67%] 2025-08-14T23:40:40.9905375Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_deg2rad_cuda_uint8 PASSED [0.0051s] [ 67%] 2025-08-14T23:40:40.9905710Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_bfloat16 PASSED [0.0049s] [ 67%] 2025-08-14T23:40:40.9905995Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_float16 PASSED [0.0051s] [ 67%] 2025-08-14T23:40:40.9906317Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erf_cuda_int8 PASSED [0.0049s] [ 67%] 2025-08-14T23:40:40.9906622Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_bool PASSED [0.0051s] [ 67%] 2025-08-14T23:40:40.9906945Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_erfinv_cuda_float16 PASSED [0.0049s] [ 67%] 2025-08-14T23:40:40.9907294Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_bool PASSED [0.0051s] [ 67%] 2025-08-14T23:40:40.9907634Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_complex64 PASSED [0.0051s] [ 67%] 2025-08-14T23:40:40.9907912Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_float16 PASSED [0.0051s] [ 67%] 2025-08-14T23:40:40.9908275Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_float64 PASSED [0.0049s] [ 67%] 2025-08-14T23:40:40.9908563Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_expm1_cuda_int64 PASSED [0.0051s] [ 68%] 2025-08-14T23:40:40.9908968Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_bfloat16 PASSED [0.0049s] [ 68%] 2025-08-14T23:40:40.9909265Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_float16 PASSED [0.0051s] [ 68%] 2025-08-14T23:40:40.9909575Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_float32 PASSED [0.0049s] [ 68%] 2025-08-14T23:40:40.9909882Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_int8 PASSED [0.0050s] [ 68%] 2025-08-14T23:40:40.9910229Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_floor_cuda_uint8 PASSED [0.0048s] [ 68%] 2025-08-14T23:40:40.9910521Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_bfloat16 PASSED [0.0051s] [ 68%] 2025-08-14T23:40:40.9910843Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_frac_cuda_float32 PASSED [0.0049s] [ 68%] 2025-08-14T23:40:40.9911142Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_bfloat16 PASSED [0.0051s] [ 68%] 2025-08-14T23:40:40.9911485Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_complex64 PASSED [0.0049s] [ 68%] 2025-08-14T23:40:40.9911793Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_float16 PASSED [0.0051s] [ 68%] 2025-08-14T23:40:40.9912127Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isinf_cuda_uint8 PASSED [0.0048s] [ 68%] 2025-08-14T23:40:40.9912427Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_complex128 PASSED [0.0049s] [ 68%] 2025-08-14T23:40:40.9912746Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_float64 PASSED [0.0048s] [ 68%] 2025-08-14T23:40:40.9913072Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int16 PASSED [0.0049s] [ 68%] 2025-08-14T23:40:40.9913390Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isnan_cuda_int64 PASSED [0.0048s] [ 69%] 2025-08-14T23:40:40.9913790Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_bfloat16 PASSED [0.0051s] [ 69%] 2025-08-14T23:40:40.9914086Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int16 PASSED [0.0048s] [ 69%] 2025-08-14T23:40:40.9914463Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int64 PASSED [0.0050s] [ 69%] 2025-08-14T23:40:40.9914741Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isneginf_cuda_int8 PASSED [0.0048s] [ 69%] 2025-08-14T23:40:40.9915129Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_bfloat16 PASSED [0.0051s] [ 69%] 2025-08-14T23:40:40.9915423Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int16 PASSED [0.0048s] [ 69%] 2025-08-14T23:40:40.9915754Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_isposinf_cuda_int32 PASSED [0.0050s] [ 69%] 2025-08-14T23:40:40.9916050Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_bfloat16 PASSED [0.0049s] [ 69%] 2025-08-14T23:40:40.9916423Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_float64 PASSED [0.0051s] [ 69%] 2025-08-14T23:40:40.9916740Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int64 PASSED [0.0049s] [ 69%] 2025-08-14T23:40:40.9917076Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_int8 PASSED [0.0051s] [ 69%] 2025-08-14T23:40:40.9917366Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_log1p_cuda_uint8 PASSED [0.0049s] [ 69%] 2025-08-14T23:40:40.9917995Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int16 SKIPPED [0.0019s] (masked.amax does not support input with torch.sparse_csr layout and torch.int16 dtype) [ 69%] 2025-08-14T23:40:40.9918549Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_int64 SKIPPED [0.0017s] (masked.amax does not support input with torch.sparse_csr layout and torch.int64 dtype) [ 69%] 2025-08-14T23:40:40.9919124Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amax_cuda_uint8 SKIPPED [0.0017s] (masked.amax does not support input with torch.sparse_csr layout and torch.uint8 dtype) [ 69%] 2025-08-14T23:40:40.9919460Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_float64 PASSED [0.1282s] [ 70%] 2025-08-14T23:40:40.9920022Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int16 SKIPPED [0.0018s] (masked.amin does not support input with torch.sparse_csr layout and torch.int16 dtype) [ 70%] 2025-08-14T23:40:40.9920593Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_amin_cuda_int32 SKIPPED [0.0019s] (masked.amin does not support input with torch.sparse_csr layout and torch.int32 dtype) [ 70%] 2025-08-14T23:40:40.9920964Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_bfloat16 PASSED [0.1140s] [ 70%] 2025-08-14T23:40:40.9921566Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_mean_cuda_complex64 SKIPPED [0.0019s] (masked.mean does not support input with torch.sparse_csr layout and torch.complex64 dtype) [ 70%] 2025-08-14T23:40:40.9921903Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_bfloat16 PASSED [0.0806s] [ 70%] 2025-08-14T23:40:40.9922258Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_complex64 PASSED [0.0564s] [ 70%] 2025-08-14T23:40:40.9922571Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_prod_cuda_float16 PASSED [0.0806s] [ 70%] 2025-08-14T23:40:40.9922930Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_complex128 PASSED [0.0561s] [ 70%] 2025-08-14T23:40:40.9923294Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int32 PASSED [0.0561s] [ 70%] 2025-08-14T23:40:40.9923670Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_masked_sum_cuda_int64 PASSED [0.0547s] [ 70%] 2025-08-14T23:40:40.9924017Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_bool PASSED [0.0232s] [ 70%] 2025-08-14T23:40:40.9924362Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_complex128 PASSED [0.0113s] [ 70%] 2025-08-14T23:40:40.9924651Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float16 PASSED [0.0110s] [ 70%] 2025-08-14T23:40:40.9924968Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float32 PASSED [0.0108s] [ 70%] 2025-08-14T23:40:40.9925279Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_mul_cuda_float64 PASSED [0.0109s] [ 71%] 2025-08-14T23:40:40.9925617Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_float32 PASSED [0.0049s] [ 71%] 2025-08-14T23:40:40.9925982Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_neg_cuda_float64 PASSED [0.0050s] [ 71%] 2025-08-14T23:40:40.9926357Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_bfloat16 PASSED [0.0065s] [ 71%] 2025-08-14T23:40:40.9926693Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_float64 PASSED [0.0067s] [ 71%] 2025-08-14T23:40:40.9927062Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int32 PASSED [0.0063s] [ 71%] 2025-08-14T23:40:40.9927403Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_nn_functional_relu_cuda_int8 PASSED [0.0064s] [ 71%] 2025-08-14T23:40:40.9927821Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_bfloat16 PASSED [0.0048s] [ 71%] 2025-08-14T23:40:40.9928143Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_complex128 PASSED [0.0050s] [ 71%] 2025-08-14T23:40:40.9928481Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_positive_cuda_float64 PASSED [0.0048s] [ 71%] 2025-08-14T23:40:40.9928770Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_float32 PASSED [0.0050s] [ 71%] 2025-08-14T23:40:40.9929148Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int16 PASSED [0.0049s] [ 71%] 2025-08-14T23:40:40.9929441Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_rad2deg_cuda_int64 PASSED [0.0050s] [ 71%] 2025-08-14T23:40:40.9929794Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_complex128 PASSED [0.0086s] [ 71%] 2025-08-14T23:40:40.9930111Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_randn_like_cuda_complex32 PASSED [0.0090s] [ 71%] 2025-08-14T23:40:40.9930426Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_bfloat16 PASSED [0.0049s] [ 72%] 2025-08-14T23:40:40.9930800Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_float64 PASSED [0.0051s] [ 72%] 2025-08-14T23:40:40.9931089Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_int8 PASSED [0.0047s] [ 72%] 2025-08-14T23:40:40.9931416Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_round_cuda_uint8 PASSED [0.0049s] [ 72%] 2025-08-14T23:40:40.9931704Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_float16 PASSED [0.0049s] [ 72%] 2025-08-14T23:40:40.9932006Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_float64 PASSED [0.0051s] [ 72%] 2025-08-14T23:40:40.9932386Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int64 PASSED [0.0047s] [ 72%] 2025-08-14T23:40:40.9932718Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_int8 PASSED [0.0050s] [ 72%] 2025-08-14T23:40:40.9933002Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sgn_cuda_uint8 PASSED [0.0048s] [ 72%] 2025-08-14T23:40:40.9933379Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_float64 PASSED [0.0051s] [ 72%] 2025-08-14T23:40:40.9933681Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int32 PASSED [0.0047s] [ 72%] 2025-08-14T23:40:40.9934005Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sign_cuda_int64 PASSED [0.0050s] [ 72%] 2025-08-14T23:40:40.9934305Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_float64 PASSED [0.0049s] [ 72%] 2025-08-14T23:40:40.9934618Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int16 PASSED [0.0051s] [ 72%] 2025-08-14T23:40:40.9934901Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int64 PASSED [0.0049s] [ 72%] 2025-08-14T23:40:40.9935278Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sin_cuda_int8 PASSED [0.0050s] [ 73%] 2025-08-14T23:40:40.9935562Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_bfloat16 PASSED [0.0049s] [ 73%] 2025-08-14T23:40:40.9935926Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_float16 PASSED [0.0051s] [ 73%] 2025-08-14T23:40:40.9936214Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_float32 PASSED [0.0049s] [ 73%] 2025-08-14T23:40:40.9936528Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sinh_cuda_int64 PASSED [0.0051s] [ 73%] 2025-08-14T23:40:40.9936884Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_bfloat16 PASSED [0.0049s] [ 73%] 2025-08-14T23:40:40.9937207Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_complex128 PASSED [0.0051s] [ 73%] 2025-08-14T23:40:40.9937523Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_float16 PASSED [0.0049s] [ 73%] 2025-08-14T23:40:40.9937853Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int16 PASSED [0.0051s] [ 73%] 2025-08-14T23:40:40.9938143Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sqrt_cuda_int64 PASSED [0.0049s] [ 73%] 2025-08-14T23:40:40.9938478Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_bfloat16 PASSED [0.0142s] [ 73%] 2025-08-14T23:40:40.9938770Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_complex32 PASSED [0.0130s] [ 73%] 2025-08-14T23:40:40.9939105Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_complex64 PASSED [0.0140s] [ 73%] 2025-08-14T23:40:40.9939410Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_float64 PASSED [0.0141s] [ 73%] 2025-08-14T23:40:40.9939739Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_sum_cuda_int8 PASSED [0.0136s] [ 73%] 2025-08-14T23:40:40.9940019Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_bool PASSED [0.0050s] [ 74%] 2025-08-14T23:40:40.9940352Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_complex128 PASSED [0.0051s] [ 74%] 2025-08-14T23:40:40.9940626Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_float32 PASSED [0.0049s] [ 74%] 2025-08-14T23:40:40.9940974Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tan_cuda_int16 PASSED [0.0051s] [ 74%] 2025-08-14T23:40:40.9941268Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_bfloat16 PASSED [0.0049s] [ 74%] 2025-08-14T23:40:40.9941672Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex128 PASSED [0.0051s] [ 74%] 2025-08-14T23:40:40.9941969Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex32 PASSED [0.0049s] [ 74%] 2025-08-14T23:40:40.9942285Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_complex64 PASSED [0.0051s] [ 74%] 2025-08-14T23:40:40.9942652Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_float64 PASSED [0.0049s] [ 74%] 2025-08-14T23:40:40.9943003Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int16 PASSED [0.0051s] [ 74%] 2025-08-14T23:40:40.9943283Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_int8 PASSED [0.0049s] [ 74%] 2025-08-14T23:40:40.9943600Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_tanh_cuda_uint8 PASSED [0.0051s] [ 74%] 2025-08-14T23:40:40.9943902Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_bool PASSED [0.0055s] [ 74%] 2025-08-14T23:40:40.9944302Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_to_sparse_cuda_float64 PASSED [0.0057s] [ 74%] 2025-08-14T23:40:40.9944672Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_bfloat16 PASSED [0.0049s] [ 74%] 2025-08-14T23:40:40.9944965Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_float16 PASSED [0.0052s] [ 75%] 2025-08-14T23:40:40.9945280Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_trunc_cuda_int8 PASSED [0.0048s] [ 75%] 2025-08-14T23:40:40.9945601Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_complex64 PASSED [0.0075s] [ 75%] 2025-08-14T23:40:40.9946020Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_float16 PASSED [0.0072s] [ 75%] 2025-08-14T23:40:40.9946351Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_float64 PASSED [0.0074s] [ 75%] 2025-08-14T23:40:40.9946687Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int16 PASSED [0.0070s] [ 75%] 2025-08-14T23:40:40.9946986Z test_sparse_csr.py::TestSparseCompressedCUDA::test_consistency_SparseCSR_zeros_like_cuda_int8 PASSED [0.0072s] [ 75%] 2025-08-14T23:40:40.9947287Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_bfloat16 PASSED [0.0114s] [ 75%] 2025-08-14T23:40:40.9947541Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_bool PASSED [0.0108s] [ 75%] 2025-08-14T23:40:40.9947871Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_float16 PASSED [0.0112s] [ 75%] 2025-08-14T23:40:40.9948125Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_float32 PASSED [0.0112s] [ 75%] 2025-08-14T23:40:40.9948412Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSC_cuda_int16 PASSED [0.0107s] [ 75%] 2025-08-14T23:40:40.9948673Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_bfloat16 PASSED [0.0111s] [ 75%] 2025-08-14T23:40:40.9948971Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_complex64 PASSED [0.0113s] [ 75%] 2025-08-14T23:40:40.9949273Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_float32 PASSED [0.0111s] [ 75%] 2025-08-14T23:40:40.9949570Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_int64 PASSED [0.0108s] [ 76%] 2025-08-14T23:40:40.9949822Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseBSR_cuda_uint8 PASSED [0.0106s] [ 76%] 2025-08-14T23:40:40.9950103Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_bool PASSED [0.0224s] [ 76%] 2025-08-14T23:40:40.9950381Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_complex64 PASSED [0.0229s] [ 76%] 2025-08-14T23:40:40.9950739Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSC_cuda_float16 PASSED [0.0227s] [ 76%] 2025-08-14T23:40:40.9951021Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_complex128 PASSED [0.0229s] [ 76%] 2025-08-14T23:40:40.9951308Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_float16 PASSED [0.0228s] [ 76%] 2025-08-14T23:40:40.9951638Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_float64 PASSED [0.0228s] [ 76%] 2025-08-14T23:40:40.9951914Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_SparseCSR_cuda_int16 PASSED [0.0224s] [ 76%] 2025-08-14T23:40:40.9952168Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_bool PASSED [0.0120s] [ 76%] 2025-08-14T23:40:40.9952524Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_complex128 PASSED [0.0115s] [ 76%] 2025-08-14T23:40:40.9952804Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSC_cuda_float32 PASSED [0.0115s] [ 76%] 2025-08-14T23:40:40.9953125Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_float64 PASSED [0.0115s] [ 76%] 2025-08-14T23:40:40.9953451Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int32 PASSED [0.0115s] [ 76%] 2025-08-14T23:40:40.9953745Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_int64 PASSED [0.0115s] [ 76%] 2025-08-14T23:40:40.9954035Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseBSR_cuda_uint8 PASSED [0.0115s] [ 76%] 2025-08-14T23:40:40.9954357Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_bfloat16 PASSED [0.0055s] [ 77%] 2025-08-14T23:40:40.9954656Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_complex128 PASSED [0.0055s] [ 77%] 2025-08-14T23:40:40.9955043Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSC_cuda_float16 PASSED [0.0055s] [ 77%] 2025-08-14T23:40:40.9955326Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_bfloat16 PASSED [0.0051s] [ 77%] 2025-08-14T23:40:40.9955638Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_bool PASSED [0.0051s] [ 77%] 2025-08-14T23:40:40.9955937Z test_sparse_csr.py::TestSparseCompressedCUDA::test_copy_errors_SparseCSR_cuda_complex128 PASSED [0.0052s] [ 77%] 2025-08-14T23:40:40.9956236Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_bool PASSED [0.0147s] [ 77%] 2025-08-14T23:40:40.9956500Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_float32 PASSED [0.0145s] [ 77%] 2025-08-14T23:40:40.9956784Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int16 PASSED [0.0144s] [ 77%] 2025-08-14T23:40:40.9957021Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int32 PASSED [0.0144s] [ 77%] 2025-08-14T23:40:40.9957353Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_int8 PASSED [0.0144s] [ 77%] 2025-08-14T23:40:40.9957610Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSC_cuda_uint8 PASSED [0.0144s] [ 77%] 2025-08-14T23:40:40.9957914Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_complex64 PASSED [0.0143s] [ 77%] 2025-08-14T23:40:40.9958173Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_float16 PASSED [0.0145s] [ 77%] 2025-08-14T23:40:40.9958445Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int32 PASSED [0.0143s] [ 77%] 2025-08-14T23:40:40.9958720Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_SparseCSR_cuda_int64 PASSED [0.0145s] [ 78%] 2025-08-14T23:40:40.9959047Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_bool PASSED [0.0010s] [ 78%] 2025-08-14T23:40:40.9959335Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_complex128 PASSED [0.0008s] [ 78%] 2025-08-14T23:40:40.9959647Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_float32 PASSED [0.0008s] [ 78%] 2025-08-14T23:40:40.9959983Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_int32 PASSED [0.0008s] [ 78%] 2025-08-14T23:40:40.9960316Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSC_cuda_uint8 PASSED [0.0008s] [ 78%] 2025-08-14T23:40:40.9960772Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_bool PASSED [0.0008s] [ 78%] 2025-08-14T23:40:40.9961042Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_errors_SparseCSR_cuda_int16 PASSED [0.0008s] [ 78%] 2025-08-14T23:40:40.9961436Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_complex128 PASSED [0.0226s] [ 78%] 2025-08-14T23:40:40.9961732Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int16 PASSED [0.0225s] [ 78%] 2025-08-14T23:40:40.9962084Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseBSC_cuda_int32 PASSED [0.0223s] [ 78%] 2025-08-14T23:40:40.9962407Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_complex64 PASSED [0.0182s] [ 78%] 2025-08-14T23:40:40.9962811Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_float32 PASSED [0.0180s] [ 78%] 2025-08-14T23:40:40.9963105Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSC_cuda_int16 PASSED [0.0180s] [ 78%] 2025-08-14T23:40:40.9963460Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_bfloat16 PASSED [0.0180s] [ 78%] 2025-08-14T23:40:40.9963756Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_complex128 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9964124Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSC_SparseCSR_cuda_int16 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9964491Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_bfloat16 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9964824Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_bool PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9965143Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_float32 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9965461Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_float64 PASSED [0.0180s] [ 79%] 2025-08-14T23:40:40.9965776Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int16 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9966115Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int32 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9966404Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_int8 PASSED [0.0180s] [ 79%] 2025-08-14T23:40:40.9966737Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseBSC_cuda_uint8 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9967044Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_bfloat16 PASSED [0.0180s] [ 79%] 2025-08-14T23:40:40.9967373Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_bool PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9967709Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_complex128 PASSED [0.0180s] [ 79%] 2025-08-14T23:40:40.9968122Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_float16 PASSED [0.0180s] [ 79%] 2025-08-14T23:40:40.9968426Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int32 PASSED [0.0181s] [ 79%] 2025-08-14T23:40:40.9968750Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_int64 PASSED [0.0181s] [ 80%] 2025-08-14T23:40:40.9969026Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSC_cuda_uint8 PASSED [0.0180s] [ 80%] 2025-08-14T23:40:40.9969457Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_bool PASSED [0.0180s] [ 80%] 2025-08-14T23:40:40.9969767Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_complex128 PASSED [0.0180s] [ 80%] 2025-08-14T23:40:40.9970161Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int64 PASSED [0.0181s] [ 80%] 2025-08-14T23:40:40.9970454Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseBSR_SparseCSR_cuda_int8 PASSED [0.0180s] [ 80%] 2025-08-14T23:40:40.9970776Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_bfloat16 PASSED [0.0166s] [ 80%] 2025-08-14T23:40:40.9971101Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_float64 PASSED [0.0166s] [ 80%] 2025-08-14T23:40:40.9971454Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSC_cuda_uint8 PASSED [0.0165s] [ 80%] 2025-08-14T23:40:40.9971757Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_bfloat16 PASSED [0.0165s] [ 80%] 2025-08-14T23:40:40.9972152Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_complex128 PASSED [0.0165s] [ 80%] 2025-08-14T23:40:40.9972469Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int16 PASSED [0.0165s] [ 80%] 2025-08-14T23:40:40.9972780Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int32 PASSED [0.0165s] [ 80%] 2025-08-14T23:40:40.9973128Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseBSR_cuda_int64 PASSED [0.0166s] [ 80%] 2025-08-14T23:40:40.9973420Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_bool PASSED [0.0208s] [ 80%] 2025-08-14T23:40:40.9973808Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSC_cuda_complex64 PASSED [0.0207s] [ 81%] 2025-08-14T23:40:40.9974117Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_bfloat16 PASSED [0.0165s] [ 81%] 2025-08-14T23:40:40.9974565Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_complex128 PASSED [0.0166s] [ 81%] 2025-08-14T23:40:40.9974886Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_complex64 PASSED [0.0166s] [ 81%] 2025-08-14T23:40:40.9975209Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_int8 PASSED [0.0165s] [ 81%] 2025-08-14T23:40:40.9975499Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSC_SparseCSR_cuda_uint8 PASSED [0.0169s] [ 81%] 2025-08-14T23:40:40.9975829Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_bfloat16 PASSED [0.0166s] [ 81%] 2025-08-14T23:40:40.9976107Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_bool PASSED [0.0166s] [ 81%] 2025-08-14T23:40:40.9976491Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_float64 PASSED [0.0166s] [ 81%] 2025-08-14T23:40:40.9976784Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int16 PASSED [0.0165s] [ 81%] 2025-08-14T23:40:40.9977111Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int32 PASSED [0.0166s] [ 81%] 2025-08-14T23:40:40.9977402Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_int8 PASSED [0.0165s] [ 81%] 2025-08-14T23:40:40.9977730Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseBSR_cuda_uint8 PASSED [0.0166s] [ 81%] 2025-08-14T23:40:40.9978043Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_bool PASSED [0.0165s] [ 81%] 2025-08-14T23:40:40.9978465Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_complex64 PASSED [0.0165s] [ 81%] 2025-08-14T23:40:40.9978766Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_float16 PASSED [0.0166s] [ 82%] 2025-08-14T23:40:40.9979094Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSC_cuda_int64 PASSED [0.0166s] [ 82%] 2025-08-14T23:40:40.9979473Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_float32 PASSED [0.0208s] [ 82%] 2025-08-14T23:40:40.9979830Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_float64 PASSED [0.0208s] [ 82%] 2025-08-14T23:40:40.9984924Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int16 PASSED [0.0208s] [ 82%] 2025-08-14T23:40:40.9985250Z test_sparse_csr.py::TestSparseCompressedCUDA::test_empty_like_SparseCSR_SparseCSR_cuda_int8 PASSED [0.0208s] [ 82%] 2025-08-14T23:40:40.9985608Z test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_SparseCSR_target_sparse_compressed_tensor_cuda PASSED [0.0041s] [ 82%] 2025-08-14T23:40:40.9986012Z test_sparse_csr.py::TestSparseCompressedCUDA::test_invalid_input_csr_large_cuda SKIPPED [0.0008s] (Only runs on cpu) [ 82%] 2025-08-14T23:40:40.9986300Z test_sparse_csr.py::TestSparseCompressedCUDA::test_layout_SparseBSC_cuda SKIPPED [0.0008s] (Only runs on cpu) [ 82%] 2025-08-14T23:40:40.9986545Z test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseBSR_cuda_float64 PASSED [0.0913s] [ 82%] 2025-08-14T23:40:40.9986787Z test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseCSC_cuda_float64 PASSED [0.0877s] [ 82%] 2025-08-14T23:40:40.9987018Z test_sparse_csr.py::TestSparseCompressedCUDA::test_pickle_SparseCSR_cuda_float64 PASSED [0.0878s] [ 82%] 2025-08-14T23:40:40.9987245Z test_sparse_csr.py::TestSparseCompressedCUDA::test_print_SparseCSC_cuda PASSED [0.3005s] [ 82%] 2025-08-14T23:40:40.9987569Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_float32 PASSED [0.1970s] [ 82%] 2025-08-14T23:40:40.9987842Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int16 PASSED [0.1506s] [ 82%] 2025-08-14T23:40:40.9988101Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int32 PASSED [0.1498s] [ 83%] 2025-08-14T23:40:40.9988365Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int32_cuda_int64 PASSED [0.1494s] [ 83%] 2025-08-14T23:40:40.9988632Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_bfloat16 PASSED [0.1948s] [ 83%] 2025-08-14T23:40:40.9988911Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_complex128 PASSED [0.2051s] [ 83%] 2025-08-14T23:40:40.9989165Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_int32 PASSED [0.1504s] [ 83%] 2025-08-14T23:40:40.9989427Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSC_int64_cuda_uint8 PASSED [0.1484s] [ 83%] 2025-08-14T23:40:40.9989686Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_bool PASSED [0.1496s] [ 83%] 2025-08-14T23:40:40.9989951Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_float16 PASSED [0.1947s] [ 83%] 2025-08-14T23:40:40.9990204Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int16 PASSED [0.1499s] [ 83%] 2025-08-14T23:40:40.9990461Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int32_cuda_int64 PASSED [0.1498s] [ 83%] 2025-08-14T23:40:40.9990727Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_bfloat16 PASSED [0.1938s] [ 83%] 2025-08-14T23:40:40.9991003Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_complex128 PASSED [0.2055s] [ 83%] 2025-08-14T23:40:40.9991258Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_int8 PASSED [0.1491s] [ 83%] 2025-08-14T23:40:40.9991596Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseBSR_int64_cuda_uint8 PASSED [0.1492s] [ 83%] 2025-08-14T23:40:40.9991865Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_complex128 PASSED [0.2010s] [ 83%] 2025-08-14T23:40:40.9992131Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_float16 PASSED [0.1900s] [ 84%] 2025-08-14T23:40:40.9992448Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_int8 PASSED [0.1449s] [ 84%] 2025-08-14T23:40:40.9992705Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int32_cuda_uint8 PASSED [0.1437s] [ 84%] 2025-08-14T23:40:40.9992957Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_bool PASSED [0.1428s] [ 84%] 2025-08-14T23:40:40.9993226Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_float16 PASSED [0.1875s] [ 84%] 2025-08-14T23:40:40.9993486Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_float32 PASSED [0.1876s] [ 84%] 2025-08-14T23:40:40.9993794Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int16 PASSED [0.1442s] [ 84%] 2025-08-14T23:40:40.9994048Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSC_int64_cuda_int64 PASSED [0.1429s] [ 84%] 2025-08-14T23:40:40.9994300Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_bool PASSED [0.1443s] [ 84%] 2025-08-14T23:40:40.9994568Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_complex64 PASSED [0.1994s] [ 84%] 2025-08-14T23:40:40.9994829Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_float32 PASSED [0.1885s] [ 84%] 2025-08-14T23:40:40.9995216Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int16 PASSED [0.1457s] [ 84%] 2025-08-14T23:40:40.9995480Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int32 PASSED [0.1441s] [ 84%] 2025-08-14T23:40:40.9995728Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int32_cuda_int8 PASSED [0.1436s] [ 84%] 2025-08-14T23:40:40.9995980Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_bool PASSED [0.1429s] [ 84%] 2025-08-14T23:40:40.9996248Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_complex128 PASSED [0.1997s] [ 84%] 2025-08-14T23:40:40.9996512Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_complex64 PASSED [0.1999s] [ 85%] 2025-08-14T23:40:40.9996761Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_int16 PASSED [0.1450s] [ 85%] 2025-08-14T23:40:40.9997013Z test_sparse_csr.py::TestSparseCompressedCUDA::test_select_copy_SparseCSR_int64_cuda_uint8 PASSED [0.1446s] [ 85%] 2025-08-14T23:40:40.9997353Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bfloat16 PASSED [0.0932s] [ 85%] 2025-08-14T23:40:40.9997685Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float16 PASSED [0.0919s] [ 85%] 2025-08-14T23:40:40.9998010Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int32 PASSED [0.0492s] [ 85%] 2025-08-14T23:40:40.9998344Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bfloat16 PASSED [0.0915s] [ 85%] 2025-08-14T23:40:40.9998672Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float16 PASSED [0.0915s] [ 85%] 2025-08-14T23:40:40.9999007Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float32 PASSED [0.0914s] [ 85%] 2025-08-14T23:40:40.9999394Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float64 PASSED [0.0913s] [ 85%] 2025-08-14T23:40:40.9999722Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int8 PASSED [0.0492s] [ 85%] 2025-08-14T23:40:41.0000060Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bfloat16 PASSED [0.0890s] [ 85%] 2025-08-14T23:40:41.0000510Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex128 PASSED [0.0922s] [ 85%] 2025-08-14T23:40:41.0000837Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int16 PASSED [0.0471s] [ 85%] 2025-08-14T23:40:41.0001172Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex64 PASSED [0.0921s] [ 85%] 2025-08-14T23:40:41.0001505Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float16 PASSED [0.0888s] [ 86%] 2025-08-14T23:40:41.0001899Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float32 PASSED [0.0889s] [ 86%] 2025-08-14T23:40:41.0002240Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float64 PASSED [0.0886s] [ 86%] 2025-08-14T23:40:41.0002564Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int32 PASSED [0.0470s] [ 86%] 2025-08-14T23:40:41.0002898Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bool PASSED [0.0881s] [ 86%] 2025-08-14T23:40:41.0003304Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex128 PASSED [0.1816s] [ 86%] 2025-08-14T23:40:41.0003650Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float16 PASSED [0.1741s] [ 86%] 2025-08-14T23:40:41.0003985Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float32 PASSED [0.1740s] [ 86%] 2025-08-14T23:40:41.0004332Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bfloat16 PASSED [0.1738s] [ 86%] 2025-08-14T23:40:41.0004676Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex128 PASSED [0.1814s] [ 86%] 2025-08-14T23:40:41.0005008Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int32 PASSED [0.0880s] [ 86%] 2025-08-14T23:40:41.0005333Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bool PASSED [0.0846s] [ 86%] 2025-08-14T23:40:41.0005672Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float16 PASSED [0.1692s] [ 86%] 2025-08-14T23:40:41.0006004Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float32 PASSED [0.1694s] [ 86%] 2025-08-14T23:40:41.0006345Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float64 PASSED [0.1688s] [ 86%] 2025-08-14T23:40:41.0006672Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int32 PASSED [0.0845s] [ 87%] 2025-08-14T23:40:41.0007016Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float16 PASSED [0.1694s] [ 87%] 2025-08-14T23:40:41.0007347Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float64 PASSED [0.1693s] [ 87%] 2025-08-14T23:40:41.0007759Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bool PASSED [0.0492s] [ 87%] 2025-08-14T23:40:41.0008128Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex128 PASSED [0.0943s] [ 87%] 2025-08-14T23:40:41.0008533Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int64 PASSED [0.0488s] [ 87%] 2025-08-14T23:40:41.0008877Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int8 PASSED [0.0488s] [ 87%] 2025-08-14T23:40:41.0009220Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_uint8 PASSED [0.0488s] [ 87%] 2025-08-14T23:40:41.0009584Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex64 PASSED [0.0941s] [ 87%] 2025-08-14T23:40:41.0009938Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float16 PASSED [0.0911s] [ 87%] 2025-08-14T23:40:41.0010338Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float64 PASSED [0.0907s] [ 87%] 2025-08-14T23:40:41.0010688Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int16 PASSED [0.0488s] [ 87%] 2025-08-14T23:40:41.0011040Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int32 PASSED [0.0487s] [ 87%] 2025-08-14T23:40:41.0011378Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int8 PASSED [0.0487s] [ 87%] 2025-08-14T23:40:41.0011772Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bool PASSED [0.0469s] [ 87%] 2025-08-14T23:40:41.0012136Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex128 PASSED [0.0918s] [ 88%] 2025-08-14T23:40:41.0012487Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float16 PASSED [0.0885s] [ 88%] 2025-08-14T23:40:41.0012830Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float32 PASSED [0.0885s] [ 88%] 2025-08-14T23:40:41.0013178Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float64 PASSED [0.0884s] [ 88%] 2025-08-14T23:40:41.0013517Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int32 PASSED [0.0472s] [ 88%] 2025-08-14T23:40:41.0013862Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int64 PASSED [0.0467s] [ 88%] 2025-08-14T23:40:41.0014201Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bool PASSED [0.0470s] [ 88%] 2025-08-14T23:40:41.0014560Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex64 PASSED [0.0918s] [ 88%] 2025-08-14T23:40:41.0014917Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bfloat16 PASSED [0.1737s] [ 88%] 2025-08-14T23:40:41.0015283Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex128 PASSED [0.1816s] [ 88%] 2025-08-14T23:40:41.0015637Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float32 PASSED [0.1735s] [ 88%] 2025-08-14T23:40:41.0016044Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_uint8 PASSED [0.0881s] [ 88%] 2025-08-14T23:40:41.0016406Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bfloat16 PASSED [0.1733s] [ 88%] 2025-08-14T23:40:41.0016838Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex128 PASSED [0.1810s] [ 88%] 2025-08-14T23:40:41.0017198Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float32 PASSED [0.1744s] [ 88%] 2025-08-14T23:40:41.0017545Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int16 PASSED [0.0882s] [ 89%] 2025-08-14T23:40:41.0017907Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bfloat16 PASSED [0.1688s] [ 89%] 2025-08-14T23:40:41.0018270Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex128 PASSED [0.1767s] [ 89%] 2025-08-14T23:40:41.0018688Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex64 PASSED [0.1785s] [ 89%] 2025-08-14T23:40:41.0019046Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float32 PASSED [0.1700s] [ 89%] 2025-08-14T23:40:41.0019398Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int8 PASSED [0.0851s] [ 89%] 2025-08-14T23:40:41.0019751Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bfloat16 PASSED [0.1716s] [ 89%] 2025-08-14T23:40:41.0020186Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex128 PASSED [0.1792s] [ 89%] 2025-08-14T23:40:41.0020546Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int32 PASSED [0.0858s] [ 89%] 2025-08-14T23:40:41.0021013Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex128 SKIPPED [0.0008s] (nothing to test) [ 89%] 2025-08-14T23:40:41.0021471Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex128 SKIPPED [0.0008s] (nothing to test) [ 89%] 2025-08-14T23:40:41.0021922Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float16 SKIPPED [0.0008s] (nothing to test) [ 89%] 2025-08-14T23:40:41.0022370Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int64 SKIPPED [0.0009s] (nothing to test) [ 89%] 2025-08-14T23:40:41.0022815Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int8 SKIPPED [0.0008s] (nothing to test) [ 89%] 2025-08-14T23:40:41.0023258Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_uint8 SKIPPED [0.0008s] (nothing to test) [ 89%] 2025-08-14T23:40:41.0023697Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bool SKIPPED [0.0008s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0024144Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float64 SKIPPED [0.0008s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0024642Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int16 SKIPPED [0.0008s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0025087Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int32 SKIPPED [0.0008s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0025579Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int8 SKIPPED [0.0012s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0026019Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_uint8 SKIPPED [0.0008s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0026457Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int16 SKIPPED [0.0008s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0026906Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_uint8 SKIPPED [0.0008s] (nothing to test) [ 90%] 2025-08-14T23:40:41.0027369Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex64 PASSED [0.1724s] [ 90%] 2025-08-14T23:40:41.0027786Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float32 PASSED [0.1645s] [ 90%] 2025-08-14T23:40:41.0028182Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int64 PASSED [0.0844s] [ 90%] 2025-08-14T23:40:41.0028649Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex64 PASSED [0.1712s] [ 90%] 2025-08-14T23:40:41.0029063Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float32 PASSED [0.1641s] [ 90%] 2025-08-14T23:40:41.0029461Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_uint8 PASSED [0.0843s] [ 90%] 2025-08-14T23:40:41.0029873Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bfloat16 PASSED [0.1597s] [ 90%] 2025-08-14T23:40:41.0030279Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex64 PASSED [0.1682s] [ 91%] 2025-08-14T23:40:41.0030697Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bfloat16 PASSED [0.1619s] [ 91%] 2025-08-14T23:40:41.0031095Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bool PASSED [0.0824s] [ 91%] 2025-08-14T23:40:41.0031510Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex64 PASSED [0.1671s] [ 91%] 2025-08-14T23:40:41.0031910Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float16 PASSED [0.1598s] [ 91%] 2025-08-14T23:40:41.0032312Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int64 PASSED [0.0809s] [ 91%] 2025-08-14T23:40:41.0032704Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_uint8 PASSED [0.0811s] [ 91%] 2025-08-14T23:40:41.0033176Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bfloat16 SKIPPED [0.0008s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0033702Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float32 SKIPPED [0.0008s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0034168Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int16 SKIPPED [0.0008s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0034683Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int64 SKIPPED [0.0009s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0035140Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_uint8 SKIPPED [0.0008s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0035611Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float64 SKIPPED [0.0008s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0036116Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int16 SKIPPED [0.0008s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0036592Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bfloat16 SKIPPED [0.0008s] (nothing to test) [ 91%] 2025-08-14T23:40:41.0037062Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex64 SKIPPED [0.0008s] (nothing to test) [ 92%] 2025-08-14T23:40:41.0037571Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float32 SKIPPED [0.0008s] (nothing to test) [ 92%] 2025-08-14T23:40:41.0038030Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int8 SKIPPED [0.0009s] (nothing to test) [ 92%] 2025-08-14T23:40:41.0038501Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bfloat16 SKIPPED [0.0008s] (nothing to test) [ 92%] 2025-08-14T23:40:41.0038975Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex128 SKIPPED [0.0008s] (nothing to test) [ 92%] 2025-08-14T23:40:41.0039445Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex64 SKIPPED [0.0008s] (nothing to test) [ 92%] 2025-08-14T23:40:41.0039915Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float16 SKIPPED [0.0008s] (nothing to test) [ 92%] 2025-08-14T23:40:41.0040433Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bfloat16 PASSED [0.1649s] [ 92%] 2025-08-14T23:40:41.0040861Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float16 PASSED [0.1641s] [ 92%] 2025-08-14T23:40:41.0041278Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int16 PASSED [0.0845s] [ 92%] 2025-08-14T23:40:41.0041694Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int32 PASSED [0.0841s] [ 92%] 2025-08-14T23:40:41.0042110Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int8 PASSED [0.0842s] [ 92%] 2025-08-14T23:40:41.0042612Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex64 PASSED [0.1707s] [ 92%] 2025-08-14T23:40:41.0043029Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float16 PASSED [0.1635s] [ 92%] 2025-08-14T23:40:41.0043507Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int32 PASSED [0.0844s] [ 92%] 2025-08-14T23:40:41.0043917Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int64 PASSED [0.0843s] [ 92%] 2025-08-14T23:40:41.0044335Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int8 PASSED [0.0842s] [ 93%] 2025-08-14T23:40:41.0044817Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bfloat16 PASSED [0.1592s] [ 93%] 2025-08-14T23:40:41.0045241Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bool PASSED [0.0811s] [ 93%] 2025-08-14T23:40:41.0045655Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int16 PASSED [0.0808s] [ 93%] 2025-08-14T23:40:41.0046072Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_uint8 PASSED [0.0805s] [ 93%] 2025-08-14T23:40:41.0046589Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bfloat16 PASSED [0.1600s] [ 93%] 2025-08-14T23:40:41.0047018Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float16 PASSED [0.1597s] [ 93%] 2025-08-14T23:40:41.0047331Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_bool PASSED [0.0979s] [ 93%] 2025-08-14T23:40:41.0047635Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int16 PASSED [0.0981s] [ 93%] 2025-08-14T23:40:41.0047936Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int32 PASSED [0.0976s] [ 93%] 2025-08-14T23:40:41.0048232Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int64 PASSED [0.0972s] [ 93%] 2025-08-14T23:40:41.0048533Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSC_cuda_int8 PASSED [0.0980s] [ 93%] 2025-08-14T23:40:41.0048834Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_bool PASSED [0.0978s] [ 93%] 2025-08-14T23:40:41.0049151Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_complex64 PASSED [0.1051s] [ 93%] 2025-08-14T23:40:41.0049452Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int16 PASSED [0.0978s] [ 93%] 2025-08-14T23:40:41.0049752Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int64 PASSED [0.0978s] [ 94%] 2025-08-14T23:40:41.0050049Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseBSR_cuda_int8 PASSED [0.0978s] [ 94%] 2025-08-14T23:40:41.0050418Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_float16 PASSED [0.0974s] [ 94%] 2025-08-14T23:40:41.0050844Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_float32 PASSED [0.0972s] [ 94%] 2025-08-14T23:40:41.0051200Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_int64 PASSED [0.0918s] [ 94%] 2025-08-14T23:40:41.0051630Z test_sparse_csr.py::TestSparseCompressedCUDA::test_sparse_compressed_tensor_with_dims_SparseCSC_cuda_uint8 PASSED [0.0914s] [ 94%] 2025-08-14T23:40:41.0051924Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_bfloat16 PASSED [0.1564s] [ 94%] 2025-08-14T23:40:41.0052206Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_float16 PASSED [0.1565s] [ 94%] 2025-08-14T23:40:41.0052488Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_float64 PASSED [0.1564s] [ 94%] 2025-08-14T23:40:41.0052742Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int32 PASSED [0.1547s] [ 94%] 2025-08-14T23:40:41.0052973Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSC_cuda_int8 PASSED [0.1544s] [ 94%] 2025-08-14T23:40:41.0053264Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_float32 PASSED [0.1544s] [ 94%] 2025-08-14T23:40:41.0053499Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_float64 PASSED [0.1545s] [ 94%] 2025-08-14T23:40:41.0053727Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseBSR_cuda_int16 PASSED [0.1532s] [ 94%] 2025-08-14T23:40:41.0053952Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_bool PASSED [0.1515s] [ 94%] 2025-08-14T23:40:41.0054191Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSC_cuda_complex64 PASSED [0.1527s] [ 95%] 2025-08-14T23:40:41.0054421Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int16 PASSED [0.1518s] [ 95%] 2025-08-14T23:40:41.0054700Z test_sparse_csr.py::TestSparseCompressedCUDA::test_to_dtype_SparseCSR_cuda_int32 PASSED [0.1531s] [ 95%] 2025-08-14T23:40:41.0054950Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_bfloat16 PASSED [0.0468s] [ 95%] 2025-08-14T23:40:41.0055176Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_bool PASSED [0.0413s] [ 95%] 2025-08-14T23:40:41.0055408Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int16 PASSED [0.0412s] [ 95%] 2025-08-14T23:40:41.0055632Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSC_cuda_int8 PASSED [0.0412s] [ 95%] 2025-08-14T23:40:41.0055874Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_bfloat16 PASSED [0.0465s] [ 95%] 2025-08-14T23:40:41.0056107Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_float32 PASSED [0.0465s] [ 95%] 2025-08-14T23:40:41.0056345Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_float64 PASSED [0.0465s] [ 95%] 2025-08-14T23:40:41.0056574Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_int64 PASSED [0.0415s] [ 95%] 2025-08-14T23:40:41.0056806Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseBSR_cuda_uint8 PASSED [0.0413s] [ 95%] 2025-08-14T23:40:41.0057033Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_bool PASSED [0.0383s] [ 95%] 2025-08-14T23:40:41.0057285Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_complex128 PASSED [0.0447s] [ 95%] 2025-08-14T23:40:41.0057525Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_complex64 PASSED [0.0448s] [ 95%] 2025-08-14T23:40:41.0057760Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_float32 PASSED [0.0435s] [ 96%] 2025-08-14T23:40:41.0057986Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int32 PASSED [0.0382s] [ 96%] 2025-08-14T23:40:41.0058219Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSC_cuda_int64 PASSED [0.0381s] [ 96%] 2025-08-14T23:40:41.0058518Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_bfloat16 PASSED [0.0435s] [ 96%] 2025-08-14T23:40:41.0058764Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_complex64 PASSED [0.0448s] [ 96%] 2025-08-14T23:40:41.0058991Z test_sparse_csr.py::TestSparseCompressedCUDA::test_validate_SparseCSR_cuda_int16 PASSED [0.0383s] [ 96%] 2025-08-14T23:40:41.0059307Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_TensorAsKey_cuda PASSED [0.0219s] [ 96%] 2025-08-14T23:40:41.0059754Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int32_cuda_float16 SKIPPED [0.0002s] (Skipped for internal with remote GPUs) [ 96%] 2025-08-14T23:40:41.0060221Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int64_cuda_bfloat16 SKIPPED [0.0002s] (Skipped for internal with remote GPUs) [ 96%] 2025-08-14T23:40:41.0060745Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_32_int64_cuda_float16 SKIPPED [0.0001s] (Skipped for internal with remote GPUs) [ 96%] 2025-08-14T23:40:41.0061345Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int32_cuda_bfloat16 SKIPPED [0.0001s] (Skipped for internal with remote GPUs) [ 96%] 2025-08-14T23:40:41.0061874Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_block_size_64_int64_cuda_bfloat16 SKIPPED [0.0001s] (Skipped for internal with remote GPUs) [ 96%] 2025-08-14T23:40:41.0062274Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_dense_bmm_error_messages_cuda_float16 PASSED [0.0039s] [ 96%] 2025-08-14T23:40:41.0062666Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16_cuda_float16 PASSED [22.0884s] [ 96%] 2025-08-14T23:40:41.0063077Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_16x32_cuda_float16 PASSED [21.4308s] [ 96%] 2025-08-14T23:40:41.0063414Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2_cuda_bfloat16 PASSED [2.7145s] [ 97%] 2025-08-14T23:40:41.0063747Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_2x3_cuda_bfloat16 PASSED [2.4376s] [ 97%] 2025-08-14T23:40:41.0064230Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_64_cuda_bfloat16 SKIPPED [0.0004s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 97%] 2025-08-14T23:40:41.0064703Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_scatter_mm_blocksize_64_cuda_float32 SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 97%] 2025-08-14T23:40:41.0065140Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_bsr_softmax_cuda_float32 SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 97%] 2025-08-14T23:40:41.0065625Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_bfloat16 SKIPPED [0.0009s] (out dtype not implemented) [ 97%] 2025-08-14T23:40:41.0066095Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_int8 SKIPPED [0.0008s] (out dtype not implemented) [ 97%] 2025-08-14T23:40:41.0066689Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_int8 SKIPPED [0.0010s] (triton kernel does not support support int8 blocks smaller than 32) [ 97%] 2025-08-14T23:40:41.0067184Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_bfloat16 SKIPPED [0.0008s] (out dtype not implemented) [ 97%] 2025-08-14T23:40:41.0067727Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float32 SKIPPED [0.0008s] (out dtype not implemented) [ 97%] 2025-08-14T23:40:41.0068203Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_int8 SKIPPED [0.0008s] (out dtype not implemented) [ 97%] 2025-08-14T23:40:41.0068850Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float32 SKIPPED [0.0008s] (Redundant test: _int_bsr_dense_addmm on torch.float32 tensors) [ 97%] 2025-08-14T23:40:41.0069445Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_int8 SKIPPED [0.0008s] (triton kernel does not support support int8 blocks smaller than 32) [ 97%] 2025-08-14T23:40:41.0069925Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_float32 SKIPPED [0.0008s] (out dtype not implemented) [ 97%] 2025-08-14T23:40:41.0070565Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_bfloat16 SKIPPED [0.0009s] (Redundant test: _int_bsr_dense_addmm on torch.bfloat16 tensors) [ 97%] 2025-08-14T23:40:41.0071153Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op__int_bsr_dense_addmm_blocksize_32_out_dtype_unspecified_cuda_float32 SKIPPED [0.0008s] (Redundant test: _int_bsr_dense_addmm on torch.float32 tensors) [ 98%] 2025-08-14T23:40:41.0071618Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_bfloat16 SKIPPED [0.0008s] (incompatible out dtype) [ 98%] 2025-08-14T23:40:41.0072131Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float16 SKIPPED [0.0008s] (incompatible out dtype) [ 98%] 2025-08-14T23:40:41.0072597Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_int32_cuda_float32 SKIPPED [0.0008s] (incompatible out dtype) [ 98%] 2025-08-14T23:40:41.0073017Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_float16 PASSED [42.2712s] [ 98%] 2025-08-14T23:40:41.0073598Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16_out_dtype_unspecified_cuda_int8 SKIPPED [0.0011s] (triton kernel does not support support int8 blocks smaller than 32) [ 98%] 2025-08-14T23:40:41.0074069Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_int32_cuda_float32 SKIPPED [0.0008s] (incompatible out dtype) [ 98%] 2025-08-14T23:40:41.0074497Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_16x32_out_dtype_unspecified_cuda_float16 PASSED [39.6102s] [ 98%] 2025-08-14T23:40:41.0074959Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_addmm_blocksize_32_out_dtype_int32_cuda_bfloat16 SKIPPED [0.0011s] (incompatible out dtype) [ 98%] 2025-08-14T23:40:41.0075382Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_bfloat16 PASSED [0.0757s] [ 98%] 2025-08-14T23:40:41.0075796Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16_out_dtype_unspecified_cuda_float16 PASSED [0.0746s] [ 98%] 2025-08-14T23:40:41.0076279Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_int32_cuda_float16 SKIPPED [0.0008s] (out dtype not implemented) [ 98%] 2025-08-14T23:40:41.0076889Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_16x32_out_dtype_unspecified_cuda_int8 SKIPPED [0.0008s] (bsr_dense_linear does not support non-square blocks) [ 98%] 2025-08-14T23:40:41.0077354Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_int32_cuda_int8 SKIPPED [0.0008s] (out dtype not implemented) [ 98%] 2025-08-14T23:40:41.0077822Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_linear_blocksize_32_out_dtype_unspecified_cuda_float16 PASSED [0.0751s] [ 98%] 2025-08-14T23:40:41.0078270Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16_out_dtype_int32_cuda_int8 SKIPPED [0.0008s] (out dtype not implemented) [ 99%] 2025-08-14T23:40:41.0078740Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_bfloat16 SKIPPED [0.0008s] (out dtype not implemented) [ 99%] 2025-08-14T23:40:41.0079252Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_float16 SKIPPED [0.0008s] (out dtype not implemented) [ 99%] 2025-08-14T23:40:41.0079725Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_int32_cuda_float32 SKIPPED [0.0008s] (out dtype not implemented) [ 99%] 2025-08-14T23:40:41.0080133Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_16x32_out_dtype_unspecified_cuda_bfloat16 PASSED [1.5005s] [ 99%] 2025-08-14T23:40:41.0080693Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_int32_cuda_int8 SKIPPED [0.0011s] (out dtype not implemented) [ 99%] 2025-08-14T23:40:41.0081094Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_kernel_op_bsr_dense_mm_blocksize_32_out_dtype_unspecified_cuda_int8 PASSED [1.5355s] [ 99%] 2025-08-14T23:40:41.0081441Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_32_cuda_bfloat16 PASSED [24.6042s] [ 99%] 2025-08-14T23:40:41.0081774Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_sampled_addmm_block_size_32_cuda_float16 PASSED [24.2516s] [ 99%] 2025-08-14T23:40:41.0082303Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_32_cuda_bfloat16 SKIPPED [0.0011s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 99%] 2025-08-14T23:40:41.0082823Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scaled_dot_product_attention_block_size_64_cuda_float32 SKIPPED [0.0008s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 99%] 2025-08-14T23:40:41.0083126Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_scatter_mm_cuda_bfloat16 PASSED [0.7035s] [ 99%] 2025-08-14T23:40:41.0083566Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_int32_cuda_int8 SKIPPED [0.0011s] (out dtype not implemented) [ 99%] 2025-08-14T23:40:41.0083954Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op__int_bsr_dense_addmm_out_dtype_unspecified_cuda_bfloat16 PASSED [4.0911s] [ 99%] 2025-08-14T23:40:41.0084386Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_bfloat16 SKIPPED [0.0010s] (incompatible out dtype) [ 99%] 2025-08-14T23:40:41.0084806Z test_sparse_csr.py::TestSparseCompressedTritonKernelsCUDA::test_triton_tune_op_bsr_dense_addmm_out_dtype_int32_cuda_float16 SKIPPED [0.0008s] (incompatible out dtype) [100%] 2025-08-14T23:40:41.0084811Z 2025-08-14T23:40:41.0085178Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_sparse_csr/test_sparse_csr-6893d71aee8f1add.xml - 2025-08-14T23:40:41.0085396Z ======== 1324 passed, 189 skipped, 153 deselected in 237.60s (0:03:57) ========= 2025-08-14T23:40:41.0085775Z The following tests failed and then succeeded when run in a new process['test/test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_bool'] 2025-08-14T23:40:41.0085859Z 2025-08-14T23:40:41.0086117Z FINISHED PRINTING LOG FILE of test_sparse_csr 3/3 (test/test-reports/test_sparse_csr_3.3_febce375c1d07f5e_.log) 2025-08-14T23:40:41.0086121Z 2025-08-14T23:40:41.0086246Z Running test_stateless 1/1 ... [2025-08-14 23:40:40.793680] 2025-08-14T23:40:41.0086331Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:40:41.0086914Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_stateless.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:40:40.794032] 2025-08-14T23:40:48.8234062Z 2025-08-14T23:40:48.8234891Z test_stateless 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_stateless_1.1_1ddc929fe34dd3ab_.log 2025-08-14T23:40:48.8252559Z Running 50 items in this shard: test/test_stateless.py::TestStatelessFunctionalAPI::test_circular_references_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_circular_references_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_batch_norm_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_batch_norm_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_member_reference_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_member_reference_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_multiple_dicts_error, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_tuple_dicts, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_error_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_error_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_gradient_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_gradient_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_jit_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_jit_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_kwargs_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_kwargs_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_in_place_operator_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_in_place_operator_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_module_fail_reset_to_original_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_module_fail_reset_to_original_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_some_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_some_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_special_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_special_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_some_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_some_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrized_module_change_parametrization_original_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrized_module_change_parametrization_original_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_errors_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_errors_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_no_error_without_flag, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_warns_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_warns_torch_func, test/test_stateless.py::TestStatelessDeprecation::test_private_stateless_warns, test/test_stateless.py::TestStatelessDeprecation::test_stateless_functional_call_warns, test/test_stateless.py::TestPythonOptimizeMode::test_runs_with_optimize_flag 2025-08-14T23:40:48.8269396Z 2025-08-14T23:40:48.8269632Z Running test_subclass 1/1 ... [2025-08-14 23:40:48.823291] 2025-08-14T23:40:48.8269999Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:40:48.8270909Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_subclass.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:40:48.823897] 2025-08-14T23:40:52.7982470Z 2025-08-14T23:40:52.7983501Z test_subclass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_subclass_1.1_c420b331af2e3a1c_.log 2025-08-14T23:40:52.8014410Z Running 100 items in this shard: test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_lazy_module_base_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_diag_tensor_below, test/test_subclass.py::TestSubclass::test_lazy_module_logging_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_sparse_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_module_optimization_base_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_diag_tensor_below, test/test_subclass.py::TestSubclass::test_module_optimization_logging_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_sparse_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_non_rewrapping_torch_dispatch_subclass_as_parameter_throws_for_detach, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_tensor_subclass_storage_data_accesses_throw, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_True 2025-08-14T23:40:52.8044412Z 2025-08-14T23:40:52.8044588Z Running test_sympy_utils 1/1 ... [2025-08-14 23:40:52.798336] 2025-08-14T23:40:52.8044952Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:40:52.8045878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sympy_utils.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:40:52.798905] 2025-08-14T23:41:15.3144112Z 2025-08-14T23:41:15.3145354Z test_sympy_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sympy_utils_1.1_afd5cdf829d5cf68_.log 2025-08-14T23:41:15.3197881Z Running 209 items in this shard: test/test_sympy_utils.py::TestNumbers::test_float_cast, test/test_sympy_utils.py::TestNumbers::test_int_infinity, test/test_sympy_utils.py::TestNumbers::test_lt_self, test/test_sympy_utils.py::TestNumbers::test_mixed_oo_int_oo, test/test_sympy_utils.py::TestNumbers::test_relation, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_and_, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_or_, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_add_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_add_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_and_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_and_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_or_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_or_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_floordiv_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_floordiv_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_maximum_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_maximum_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_minimum_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_minimum_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mod_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mod_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mul_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mul_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_by_natural_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_by_natural_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_sub_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_sub_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_truediv_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_truediv_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_add, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_eq, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_floordiv, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_ge, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_gt, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_le, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_lt, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_maximum, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_minimum, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_mod, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_mul, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_ne, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_pow, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_pow_by_natural, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_sub, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_truediv, test/test_sympy_utils.py::TestValueRanges::test_bitwise_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_bitwise_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_mul_zero_unknown, test/test_sympy_utils.py::TestValueRanges::test_pow_half, test/test_sympy_utils.py::TestValueRanges::test_unary_bool_ref_range_fn_not_, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_abs_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_abs_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_ceil_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_ceil_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_exp_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_exp_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_floor_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_floor_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_log_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_log_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_neg_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_neg_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_reciprocal_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_reciprocal_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_sqrt_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_sqrt_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_square_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_square_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_abs, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_ceil, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_exp, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_floor, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_log, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_neg, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_reciprocal, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_sqrt, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_truediv, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_truediv, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_truediv, test/test_sympy_utils.py::TestSympySolve::test_addition, test/test_sympy_utils.py::TestSympySolve::test_floordiv_Equality, test/test_sympy_utils.py::TestSympySolve::test_floordiv_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_LessThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_Unequality, test/test_sympy_utils.py::TestSympySolve::test_floordiv_eq_simplify, test/test_sympy_utils.py::TestSympySolve::test_give_up, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_Equality, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_Unequality, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_LessThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_Equality, test/test_sympy_utils.py::TestSympySolve::test_noop_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_LessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_Unequality, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_Equality, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_LessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_Unequality, test/test_sympy_utils.py::TestSympySolve::test_simple_floordiv_gcd, test/test_sympy_utils.py::TestSympySolve::test_z3_proof_floordiv_eq_simplify, test/test_sympy_utils.py::TestSympyFunctions::test_pickle, test/test_sympy_utils.py::TestSingletonInt::test_basic, test/test_sympy_utils.py::TestIdentity::test_cast_identity_float, test/test_sympy_utils.py::TestIdentity::test_cast_identity_illegal, test/test_sympy_utils.py::TestIdentity::test_cast_identity_int, test/test_sympy_utils.py::TestIdentity::test_expand_identity, test/test_sympy_utils.py::TestTypedExpr::test_typed_expr 2025-08-14T23:41:15.3246881Z 2025-08-14T23:41:15.3247074Z Running test_tensorexpr_pybind 1/1 ... [2025-08-14 23:41:15.314470] 2025-08-14T23:41:15.3247462Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:41:15.3248414Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_tensorexpr_pybind.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:41:15.315121] 2025-08-14T23:41:19.0386872Z 2025-08-14T23:41:19.0387783Z test_tensorexpr_pybind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorexpr_pybind_1.1_aa0ee8ac32b6e29f_.log 2025-08-14T23:41:19.0392975Z Running 17 items in this shard: test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_alloc_in_loop, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_call_raw, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dtype_error, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dynamic_shape, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dynamic_shape_2d, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_external_calls, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_shape_prop, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_shape_prop_module, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_custom_lowering, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_expand, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_permute, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_scalar_inputs, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_t, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_tensor_inputs, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_transpose, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_simple_sum, test/test_tensorexpr_pybind.py::TestExprHandlePyBind::test_unary_ops 2025-08-14T23:41:19.0398533Z 2025-08-14T23:41:19.0398732Z Running test_transformers 1/1 ... [2025-08-14 23:41:19.038671] 2025-08-14T23:41:19.0399103Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:41:19.0400036Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_transformers.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:41:19.039268] 2025-08-14T23:43:01.6797743Z 2025-08-14T23:43:01.6798632Z test_meta 3/4 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_3.4_806396a21c269af7_.log 2025-08-14T23:43:01.9297517Z Running 10118 items in this shard: test/test_meta.py::TestMetaConverter::test_requires_grad_false, test/test_meta.py::TestMetaConverter::test_tensor_outlives_converter, test/test_meta.py::TestMetaConverter::test_view_as_complex, test/test_meta.py::TestMetaConverter::test_view_as_real, test/test_meta.py::TestMetaConverter::test_view_mutate, test/test_meta.py::TestMetaConverter::test_view_of_non_leaf, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask1_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorsolve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_embedding_bag_byte_unpack_cuda, test/test_meta.py::TestMetaCUDA::test_embedding_bag_dense_backward_mode_2_cuda, test/test_meta.py::TestMetaCUDA::test_empty_quantized_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask5_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_map_location_deserialize_cuda, test/test_meta.py::TestMetaCUDA::test_meta_autograd_no_error_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_alpha_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_nonzero_cuda 2025-08-14T23:43:02.1563737Z 2025-08-14T23:43:02.1563890Z Running xpu/test_gemm 1/1 ... [2025-08-14 23:43:01.695153] 2025-08-14T23:43:02.1564185Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:43:02.1564924Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '-m', 'not serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-08-14 23:43:01.695703] 2025-08-14T23:43:05.3855377Z 2025-08-14T23:43:05.3856643Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_c13162caaed728ad_.log 2025-08-14T23:43:05.3857312Z Running 0 items in this shard: 2025-08-14T23:43:05.3857493Z 2025-08-14T23:45:23.5986942Z 2025-08-14T23:45:23.5988268Z test_transformers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_transformers_1.1_d8f3ea28b7471470_.log 2025-08-14T23:45:24.3577705Z Running 12244 items in this shard: test/test_transformers.py::TestTransformersCUDA::test_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_only_layer_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_disable_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_is_causal_gpu_cuda, test/test_transformers.py::TestTransformersCUDA::test_kpm_mask_trailing_column_with_nested_tensor_cuda, test/test_transformers.py::TestTransformersCUDA::test_mask_check_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_math_backend_high_precision_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_mha_in_proj_weight_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_self_attn_TxT_attn_mask_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_pad_and_catch_error_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformer_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_3_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_1_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_8_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_model_cuda, test/test_transformers.py::TestTransformersCUDA::test_with_nested_tensor_input_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_dispatch_fails_no_backend_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_atteention_large_bf16_nan_values_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_attention_fail_with_non_square_causal_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_bfloat16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_float16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_fail_fp32_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_error_cases_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_requires_grad_failure_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_attn_mask_present_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sdpa_kernel_grouped_query_attention_cuda_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_error_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_large_seq_len_uniform_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_efficient_fail_bfloat16_less_than_sm80_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_nested_fails_on_padding_head_dim_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_unaligned_tensors_cuda, test/test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_math_with_negative_scale_kernel0_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_False_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_d256_heuristic_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_fail_d128_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_gqa_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_nonmodulo64seqlen_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_preserves_query_layout_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_trivial_output_transpose_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_query_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_nested_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contig_mask_bug_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_backwards_determinism_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_2_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_3_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_4_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_singelton_head_dim_stride_ne_1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_and_mask_fails_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape3_cuda 2025-08-14T23:45:25.0264986Z 2025-08-14T23:45:25.0265136Z Running test batch 'tests to run' cost 8545.94 seconds 2025-08-14T23:45:25.1645352Z 2025-08-14T23:45:25.1645538Z real 142m30.740s 2025-08-14T23:45:25.1645849Z user 1439m4.950s 2025-08-14T23:45:25.1646110Z sys 77m1.369s 2025-08-14T23:45:25.1646370Z + assert_git_not_dirty 2025-08-14T23:45:25.1646688Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-08-14T23:45:25.1647062Z + test_libtorch 2 2025-08-14T23:45:25.1647333Z + local SHARD=2 2025-08-14T23:45:25.1647612Z + [[ default != \s\l\o\w ]] 2025-08-14T23:45:25.1647959Z + echo 'Testing libtorch' 2025-08-14T23:45:25.1648254Z Testing libtorch 2025-08-14T23:45:25.1649441Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libbackend_with_compiler.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1680738Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libjitbackend_test.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1710491Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libcaffe2_nvrtc.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1730984Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libc10.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libc10_hip.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1752096Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libshm /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libshm.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libshm_windows /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1774998Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_hip.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_python.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorchbind_test.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1795852Z + ln -sf '/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libnvfuser*' /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1813468Z + export CPP_TESTS_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1814084Z + CPP_TESTS_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:25.1814500Z + [[ -z 2 ]] 2025-08-14T23:45:25.1814707Z + [[ 2 == \1 ]] 2025-08-14T23:45:25.1814947Z + [[ -z 2 ]] 2025-08-14T23:45:25.1815148Z + [[ 2 == \2 ]] 2025-08-14T23:45:25.1815359Z + test_libtorch_jit 2025-08-14T23:45:25.1815576Z + pushd test 2025-08-14T23:45:25.1815788Z ~/pytorch/test ~/pytorch 2025-08-14T23:45:25.1816056Z + python cpp/jit/tests_setup.py setup 2025-08-14T23:45:26.6948843Z + popd 2025-08-14T23:45:26.6949303Z ~/pytorch 2025-08-14T23:45:26.6949787Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]] 2025-08-14T23:45:26.6950831Z + python test/run_test.py --cpp --verbose -i cpp/test_jit cpp/test_lazy -k 'not CUDA' 2025-08-14T23:45:29.1854446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:45:29.1857130Z import pkg_resources 2025-08-14T23:45:30.2880831Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-08-14T23:45:30.2978770Z Found test times from artifacts 2025-08-14T23:45:30.3390668Z Found test times from artifacts 2025-08-14T23:45:30.3400733Z Running all tests 2025-08-14T23:45:30.3408679Z Running parallel tests on 2 processes 2025-08-14T23:45:30.3409438Z Name: tests to run (est. time: 0.0min) 2025-08-14T23:45:30.3410010Z Serial tests (0): 2025-08-14T23:45:30.3410445Z Parallel tests (2): 2025-08-14T23:45:30.3410877Z cpp/test_jit 1/1 2025-08-14T23:45:30.3411365Z cpp/test_lazy 1/1 2025-08-14T23:45:30.3411808Z Name: excluded (est. time: 0.0min) 2025-08-14T23:45:30.3412312Z Serial tests (0): 2025-08-14T23:45:30.3412772Z Parallel tests (0): 2025-08-14T23:45:30.3413645Z Running cpp/test_jit 1/1 ... [2025-08-14 23:45:30.340894] 2025-08-14T23:45:30.3414869Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:45:30.3419103Z Executing ['pytest', '/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/test_jit', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-b0fd4eb233021ad8.xml', '-k', 'not CUDA', '-x', '--reruns=2'] ... [2025-08-14 23:45:30.341436] 2025-08-14T23:45:31.4600887Z 2025-08-14T23:45:31.4602334Z cpp/test_jit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.test_jit_1.1_276af74f64b805f3_.log 2025-08-14T23:45:31.4603304Z 2025-08-14T23:45:31.4603815Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-08-14T23:45:31.4604652Z Uploading artifacts took 0.00 seconds 2025-08-14T23:45:31.4605293Z Running cpp/test_lazy 1/1 ... [2025-08-14 23:45:31.459751] 2025-08-14T23:45:31.4605951Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:45:31.4609932Z Executing ['pytest', '/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/test_lazy', '-m', 'serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-2bf3a02e3170a9bb.xml', '-k', 'not CUDA', '-x', '--reruns=2'] ... [2025-08-14 23:45:31.460408] 2025-08-14T23:45:32.2782748Z 2025-08-14T23:45:32.2784668Z cpp/test_lazy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.test_lazy_1.1_2fb7343cc49e756f_.log 2025-08-14T23:45:32.2785732Z 2025-08-14T23:45:34.8282989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:45:34.8285613Z import pkg_resources 2025-08-14T23:45:34.8392092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:45:34.8394587Z import pkg_resources 2025-08-14T23:45:35.2971920Z Running cpp/test_jit 1/1 ... [2025-08-14 23:45:35.296632] 2025-08-14T23:45:35.2972736Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:45:35.2977514Z Executing ['pytest', '/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/test_jit', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-818ad6eafaf9e3bd.xml', '-k', 'not CUDA', '-x', '--reruns=2'] ... [2025-08-14 23:45:35.297129] 2025-08-14T23:45:35.3050334Z Running cpp/test_lazy 1/1 ... [2025-08-14 23:45:35.304212] 2025-08-14T23:45:35.3051066Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-08-14T23:45:35.3053255Z Executing ['pytest', '/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/test_lazy', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports/python-pytest/test.run_test/test.run_test-271899efeb45eab4.xml', '-k', 'not CUDA', '-x', '--reruns=2'] ... [2025-08-14 23:45:35.304667] 2025-08-14T23:45:36.0640825Z 2025-08-14T23:45:36.0642391Z cpp/test_jit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.test_jit_1.1_705b35bfd36edc53_.log 2025-08-14T23:45:36.0643960Z 2025-08-14T23:45:36.1222324Z 2025-08-14T23:45:36.1223585Z cpp/test_lazy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp.test_lazy_1.1_e1a2f0fd405d614d_.log 2025-08-14T23:45:36.1224604Z 2025-08-14T23:45:37.0305526Z Running test batch 'tests to run' cost 6.69 seconds 2025-08-14T23:45:37.6644286Z + pushd test 2025-08-14T23:45:37.6644727Z ~/pytorch/test ~/pytorch 2025-08-14T23:45:37.6645243Z + python cpp/jit/tests_setup.py shutdown 2025-08-14T23:45:38.9830030Z + popd 2025-08-14T23:45:38.9830488Z ~/pytorch 2025-08-14T23:45:38.9830889Z + assert_git_not_dirty 2025-08-14T23:45:38.9831403Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-08-14T23:45:38.9832001Z + test_aot_compilation 2025-08-14T23:45:38.9832494Z + echo 'Testing Ahead of Time compilation' 2025-08-14T23:45:38.9833121Z Testing Ahead of Time compilation 2025-08-14T23:45:38.9835009Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libc10.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libc10_hip.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:38.9852360Z + ln -sf /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_hip.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch_python.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorchbind_test.so /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-08-14T23:45:38.9873604Z + '[' -f /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/test_mobile_nnc ']' 2025-08-14T23:45:38.9875053Z + '[' -f /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/aot_model_compiler_test ']' 2025-08-14T23:45:38.9875991Z + test_custom_script_ops 2025-08-14T23:45:38.9876514Z + echo 'Testing custom script operators' 2025-08-14T23:45:38.9877620Z Testing custom script operators 2025-08-14T23:45:38.9878201Z + [[ linux-jammy-rocm-py3.10 == *s390x* ]] 2025-08-14T23:45:38.9878919Z + CUSTOM_OP_BUILD=/var/lib/jenkins/pytorch/build/custom_test_artifacts/custom-op-build 2025-08-14T23:45:38.9879374Z + pushd test/custom_operator 2025-08-14T23:45:38.9879654Z ~/pytorch/test/custom_operator ~/pytorch 2025-08-14T23:45:38.9880091Z + cp -a /var/lib/jenkins/pytorch/build/custom_test_artifacts/custom-op-build build 2025-08-14T23:45:39.0036962Z + python test_custom_ops.py -v 2025-08-14T23:45:41.4529384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:45:41.4531922Z import pkg_resources 2025-08-14T23:45:41.5177768Z Test results will be stored in test-reports/python-unittest/test_custom_ops 2025-08-14T23:45:41.5190518Z 2025-08-14T23:45:41.5191243Z Running tests... 2025-08-14T23:45:41.5191961Z ---------------------------------------------------------------------- 2025-08-14T23:45:41.9607860Z test_abstract_impl_pystub_faketensor (__main__.TestCustomOperators) ... /var/lib/jenkins/pytorch/test/custom_operator/my_custom_ops.py:13: FutureWarning: `create_unbacked_symint` is deprecated, please use `new_dynamic_size` instead 2025-08-14T23:45:41.9610037Z nnz = ctx.create_unbacked_symint() 2025-08-14T23:45:41.9679060Z ok (0.449s) 2025-08-14T23:45:41.9711189Z test_abstract_impl_pystub_meta (__main__.TestCustomOperators) ... ok (0.003s) 2025-08-14T23:45:41.9728684Z test_calling_custom_op (__main__.TestCustomOperators) ... ok (0.002s) 2025-08-14T23:45:42.0069572Z test_calling_custom_op_inside_script_module (__main__.TestCustomOperators) ... ok (0.034s) 2025-08-14T23:45:42.0071724Z test_calling_custom_op_string (__main__.TestCustomOperators) ... ok (0.000s) 2025-08-14T23:45:42.0094552Z test_calling_custom_op_with_autograd (__main__.TestCustomOperators) ... /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:841: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1291.) 2025-08-14T23:45:42.0099524Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-08-14T23:45:42.0101581Z ok (0.003s) 2025-08-14T23:45:42.0108056Z test_calling_custom_op_with_autograd_in_nograd_mode (__main__.TestCustomOperators) ... ok (0.001s) 2025-08-14T23:45:42.0109341Z test_custom_library_is_loaded (__main__.TestCustomOperators) ... ok (0.000s) 2025-08-14T23:45:42.0866151Z test_dynamo_pystub_suggestion (__main__.TestCustomOperators) ... ok (0.075s) 2025-08-14T23:45:42.0872809Z test_op_with_incorrect_abstract_impl_pystub (__main__.TestCustomOperators) ... ok (0.001s) 2025-08-14T23:45:42.0879535Z test_op_with_no_abstract_impl_pystub (__main__.TestCustomOperators) ... ok (0.001s) 2025-08-14T23:45:42.0943928Z test_saving_and_loading_script_module_with_custom_op (__main__.TestCustomOperators) ... ok (0.006s) 2025-08-14T23:45:42.0944822Z 2025-08-14T23:45:42.0945113Z ---------------------------------------------------------------------- 2025-08-14T23:45:42.0945801Z Ran 12 tests in 0.575s 2025-08-14T23:45:42.0946100Z 2025-08-14T23:45:42.0946244Z OK 2025-08-14T23:45:42.0946445Z 2025-08-14T23:45:42.0946635Z Generating XML reports... 2025-08-14T23:45:42.0975293Z Generated XML report: test-reports/python-unittest/test_custom_ops/TEST-TestCustomOperators-20250814234541.xml 2025-08-14T23:45:42.7360510Z + python model.py --export-script-module=model.pt 2025-08-14T23:45:44.0529052Z + build/test_custom_ops ./model.pt 2025-08-14T23:45:44.3568004Z [W814 23:45:44.463942208 engine.cpp:1291] Warning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (function operator()) 2025-08-14T23:45:44.8518357Z ok 2025-08-14T23:45:45.0109348Z + popd 2025-08-14T23:45:45.0112269Z ~/pytorch 2025-08-14T23:45:45.0112732Z + assert_git_not_dirty 2025-08-14T23:45:45.0113300Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-08-14T23:45:45.0113883Z + test_custom_backend 2025-08-14T23:45:45.0114354Z + echo 'Testing custom backends' 2025-08-14T23:45:45.0114857Z Testing custom backends 2025-08-14T23:45:45.0115709Z + CUSTOM_BACKEND_BUILD=/var/lib/jenkins/pytorch/build/custom_test_artifacts/custom-backend-build 2025-08-14T23:45:45.0116672Z + pushd test/custom_backend 2025-08-14T23:45:45.0117285Z ~/pytorch/test/custom_backend ~/pytorch 2025-08-14T23:45:45.0118332Z + cp -a /var/lib/jenkins/pytorch/build/custom_test_artifacts/custom-backend-build build 2025-08-14T23:45:45.0330249Z + python test_custom_backend.py -v 2025-08-14T23:45:47.5387345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-08-14T23:45:47.5388740Z import pkg_resources 2025-08-14T23:45:47.5728492Z Test results will be stored in test-reports/python-unittest/test_custom_backend 2025-08-14T23:45:47.5737483Z 2025-08-14T23:45:47.5737636Z Running tests... 2025-08-14T23:45:47.5737960Z ---------------------------------------------------------------------- 2025-08-14T23:45:47.5742306Z test_execute (__main__.TestCustomBackend) 2025-08-14T23:45:47.7068900Z Test execution using the custom backend. ... ok (0.133s) 2025-08-14T23:45:47.7072523Z test_save_load (__main__.TestCustomBackend) 2025-08-14T23:45:47.7236045Z Test that a lowered module can be executed correctly ... ok (0.017s) 2025-08-14T23:45:47.7236358Z 2025-08-14T23:45:47.7236538Z ---------------------------------------------------------------------- 2025-08-14T23:45:47.7237255Z Ran 2 tests in 0.150s 2025-08-14T23:45:47.7237400Z 2025-08-14T23:45:47.7237480Z OK 2025-08-14T23:45:47.7237577Z 2025-08-14T23:45:47.7237674Z Generating XML reports... 2025-08-14T23:45:47.7260412Z Generated XML report: test-reports/python-unittest/test_custom_backend/TEST-TestCustomBackend-20250814234547.xml 2025-08-14T23:45:48.3420263Z + python backend.py --export-module-to=model.pt 2025-08-14T23:45:49.8152766Z + build/test_custom_backend ./model.pt 2025-08-14T23:45:50.0874543Z Testing custom_backend 2025-08-14T23:45:50.1534348Z OK 2025-08-14T23:45:50.2589764Z + rm -f ./model.pt 2025-08-14T23:45:50.2619727Z + popd 2025-08-14T23:45:50.2620750Z ~/pytorch 2025-08-14T23:45:50.2621864Z + assert_git_not_dirty 2025-08-14T23:45:50.2622450Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-08-14T23:45:50.2623047Z + test_torch_function_benchmark 2025-08-14T23:45:50.2623666Z + echo 'Testing __torch_function__ benchmarks' 2025-08-14T23:45:50.2624411Z Testing __torch_function__ benchmarks 2025-08-14T23:45:50.2624998Z + pushd benchmarks/overrides_benchmark 2025-08-14T23:45:50.2625622Z ~/pytorch/benchmarks/overrides_benchmark ~/pytorch 2025-08-14T23:45:50.2626260Z + python bench.py -n 1 -m 2 2025-08-14T23:45:51.3590574Z Type tensor had a minimum time of 0.006198883056640625 us and a standard deviation of 0.9638141491450369 us. 2025-08-14T23:45:51.3592097Z Type SubTensor had a minimum time of 0.013589859008789062 us and a standard deviation of 0.04349554728833027 us. 2025-08-14T23:45:51.3594207Z Type WithTorchFunction had a minimum time of 0.0054836273193359375 us and a standard deviation of 0.01685873940004967 us. 2025-08-14T23:45:51.3595768Z Type SubWithTorchFunction had a minimum time of 0.010013580322265625 us and a standard deviation of 0.006406321062968345 us. 2025-08-14T23:45:51.5926635Z + python pyspybench.py Tensor -n 1 2025-08-14T23:45:52.9503720Z + python pyspybench.py SubTensor -n 1 2025-08-14T23:45:54.2949229Z + python pyspybench.py WithTorchFunction -n 1 2025-08-14T23:45:55.7028832Z + python pyspybench.py SubWithTorchFunction -n 1 2025-08-14T23:45:57.0274532Z + popd 2025-08-14T23:45:57.0275266Z ~/pytorch 2025-08-14T23:45:57.0275844Z + assert_git_not_dirty 2025-08-14T23:45:57.0276282Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]] 2025-08-14T23:45:57.0277551Z + sccache_epilogue 2025-08-14T23:45:57.0278250Z + echo '::group::Sccache Compilation Log' 2025-08-14T23:45:57.0279327Z ##[group]Sccache Compilation Log 2025-08-14T23:45:57.0279950Z + echo '=================== sccache compilation log ===================' 2025-08-14T23:45:57.0280894Z =================== sccache compilation log =================== 2025-08-14T23:45:57.0282130Z + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-08-14T23:45:57.0406363Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-08-14T23:45:57.0407439Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-08-14T23:45:57.0408161Z + sccache --show-stats 2025-08-14T23:45:57.0436570Z Compile requests 8605 2025-08-14T23:45:57.0437220Z Compile requests executed 310 2025-08-14T23:45:57.0437783Z Cache hits 22 2025-08-14T23:45:57.0438293Z Cache hits (C/C++) 22 2025-08-14T23:45:57.0439508Z Cache misses 288 2025-08-14T23:45:57.0440516Z Cache misses (C/C++) 282 2025-08-14T23:45:57.0441154Z Cache misses (HIP) 6 2025-08-14T23:45:57.0441780Z Cache hits rate 7.10 % 2025-08-14T23:45:57.0442345Z Cache hits rate (C/C++) 7.24 % 2025-08-14T23:45:57.0442667Z Cache hits rate (HIP) 0.00 % 2025-08-14T23:45:57.0442961Z Cache timeouts 0 2025-08-14T23:45:57.0443225Z Cache read errors 0 2025-08-14T23:45:57.0443485Z Forced recaches 0 2025-08-14T23:45:57.0443741Z Cache write errors 0 2025-08-14T23:45:57.0444123Z Cache errors 0 2025-08-14T23:45:57.0444378Z Compilations 288 2025-08-14T23:45:57.0444635Z Compilation failures 0 2025-08-14T23:45:57.0444900Z Non-cacheable compilations 0 2025-08-14T23:45:57.0445173Z Non-cacheable calls 163 2025-08-14T23:45:57.0445436Z Non-compilation calls 8132 2025-08-14T23:45:57.0445713Z Unsupported compiler calls 0 2025-08-14T23:45:57.0445987Z Average cache write 0.000 s 2025-08-14T23:45:57.0446274Z Average compiler 3.203 s 2025-08-14T23:45:57.0446548Z Average cache read hit 0.000 s 2025-08-14T23:45:57.0446829Z Failed distributed compilations 0 2025-08-14T23:45:57.0447017Z 2025-08-14T23:45:57.0447109Z Non-cacheable reasons: 2025-08-14T23:45:57.0447350Z unknown source language 127 2025-08-14T23:45:57.0447610Z -E 36 2025-08-14T23:45:57.0447786Z 2025-08-14T23:45:57.0447971Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-08-14T23:45:57.0448353Z Use direct/preprocessor mode? yes 2025-08-14T23:45:57.0448633Z Version (client) 0.10.0 2025-08-14T23:45:57.0448902Z Cache size 18 MiB 2025-08-14T23:45:57.0449180Z Max cache size 10 GiB 2025-08-14T23:45:57.0449460Z + sccache --stop-server 2025-08-14T23:45:57.0479112Z Stopping sccache server... 2025-08-14T23:45:57.0485450Z Compile requests 8605 2025-08-14T23:45:57.0486688Z Compile requests executed 310 2025-08-14T23:45:57.0499324Z Cache hits 22 2025-08-14T23:45:57.0499665Z Cache hits (C/C++) 22 2025-08-14T23:45:57.0499948Z Cache misses 288 2025-08-14T23:45:57.0500226Z Cache misses (C/C++) 282 2025-08-14T23:45:57.0500504Z Cache misses (HIP) 6 2025-08-14T23:45:57.0500785Z Cache hits rate 7.10 % 2025-08-14T23:45:57.0501074Z Cache hits rate (C/C++) 7.24 % 2025-08-14T23:45:57.0501357Z Cache hits rate (HIP) 0.00 % 2025-08-14T23:45:57.0501627Z Cache timeouts 0 2025-08-14T23:45:57.0501898Z Cache read errors 0 2025-08-14T23:45:57.0502164Z Forced recaches 0 2025-08-14T23:45:57.0502434Z Cache write errors 0 2025-08-14T23:45:57.0502693Z Cache errors 0 2025-08-14T23:45:57.0502957Z Compilations 288 2025-08-14T23:45:57.0503229Z Compilation failures 0 2025-08-14T23:45:57.0503501Z Non-cacheable compilations 0 2025-08-14T23:45:57.0503780Z Non-cacheable calls 163 2025-08-14T23:45:57.0504033Z Non-compilation calls 8132 2025-08-14T23:45:57.0504313Z Unsupported compiler calls 0 2025-08-14T23:45:57.0504592Z Average cache write 0.000 s 2025-08-14T23:45:57.0504875Z Average compiler 3.203 s 2025-08-14T23:45:57.0505172Z Average cache read hit 0.000 s 2025-08-14T23:45:57.0505457Z Failed distributed compilations 0 2025-08-14T23:45:57.0505636Z 2025-08-14T23:45:57.0505739Z Non-cacheable reasons: 2025-08-14T23:45:57.0505984Z unknown source language 127 2025-08-14T23:45:57.0506459Z -E 36 2025-08-14T23:45:57.0506736Z 2025-08-14T23:45:57.0506914Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-08-14T23:45:57.0507299Z Use direct/preprocessor mode? yes 2025-08-14T23:45:57.0507605Z Version (client) 0.10.0 2025-08-14T23:45:57.0507895Z Cache size 18 MiB 2025-08-14T23:45:57.0508185Z Max cache size 10 GiB 2025-08-14T23:45:57.0508463Z + echo ::endgroup:: 2025-08-14T23:45:57.0508846Z ##[endgroup] 2025-08-14T23:45:57.0595782Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-08-14T23:45:57.0597430Z # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-08-14T23:45:57.0599048Z docker exec -t "3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" 2025-08-14T23:45:57.0645254Z shell: /usr/bin/bash -e {0} 2025-08-14T23:45:57.0645713Z env: 2025-08-14T23:45:57.0646070Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:45:57.0646733Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:45:57.0647702Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:45:57.0648616Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:45:57.0650491Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:45:57.0652141Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:45:57.0652683Z AWS_REGION: us-east-1 2025-08-14T23:45:57.0653309Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:45:57.0654033Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:45:57.0659872Z AWS_SESSION_TOKEN: *** 2025-08-14T23:45:57.0660682Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:45:57.0661566Z ##[endgroup] 2025-08-14T23:45:57.2045024Z ##[group]Run cat test/**/*_toprint.log || true 2025-08-14T23:45:57.2045732Z cat test/**/*_toprint.log || true 2025-08-14T23:45:57.2101458Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:45:57.2102110Z env: 2025-08-14T23:45:57.2102499Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:45:57.2103333Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:45:57.2104552Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:45:57.2105714Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:45:57.2107426Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:45:57.2108926Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:45:57.2109431Z AWS_REGION: us-east-1 2025-08-14T23:45:57.2110036Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:45:57.2110694Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:45:57.2121128Z AWS_SESSION_TOKEN: *** 2025-08-14T23:45:57.2121898Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:45:57.2122689Z ##[endgroup] 2025-08-14T23:45:57.2361758Z cat: 'test/**/*_toprint.log': No such file or directory 2025-08-14T23:45:57.2587381Z Prepare all required actions 2025-08-14T23:45:57.2588400Z Getting action download info 2025-08-14T23:45:57.5926659Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-08-14T23:45:58.2239367Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-08-14T23:45:58.9832152Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-08-14T23:45:58.9832431Z with: 2025-08-14T23:45:58.9832598Z use-gha: true 2025-08-14T23:45:58.9832974Z file-suffix: test-default-2-6-linux.rocm.gpu.2_48127805673 2025-08-14T23:45:58.9833269Z s3-bucket: gha-artifacts 2025-08-14T23:45:58.9833467Z env: 2025-08-14T23:45:58.9833630Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:45:58.9833910Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:45:58.9834332Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:45:58.9834762Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:45:58.9835478Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:45:58.9836232Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:45:58.9836443Z AWS_REGION: us-east-1 2025-08-14T23:45:58.9836688Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:45:58.9836971Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:45:58.9842560Z AWS_SESSION_TOKEN: *** 2025-08-14T23:45:58.9842909Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:45:58.9843250Z ##[endgroup] 2025-08-14T23:45:58.9900820Z ##[group]Run actions/upload-artifact@v4 2025-08-14T23:45:58.9901044Z with: 2025-08-14T23:45:58.9901340Z name: test-jsons-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip 2025-08-14T23:45:58.9901713Z retention-days: 14 2025-08-14T23:45:58.9901910Z if-no-files-found: warn 2025-08-14T23:45:58.9902101Z path: test/**/*.json 2025-08-14T23:45:58.9902281Z compression-level: 6 2025-08-14T23:45:58.9902461Z overwrite: false 2025-08-14T23:45:58.9902636Z include-hidden-files: false 2025-08-14T23:45:58.9902822Z env: 2025-08-14T23:45:58.9902981Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:45:58.9903267Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:45:58.9903691Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:45:58.9904086Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:45:58.9904774Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:45:58.9905387Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:45:58.9905585Z AWS_REGION: us-east-1 2025-08-14T23:45:58.9905808Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:45:58.9906077Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:45:58.9910280Z AWS_SESSION_TOKEN: *** 2025-08-14T23:45:58.9910590Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:45:58.9910917Z ##[endgroup] 2025-08-14T23:45:59.7496684Z With the provided path, there will be 7 files uploaded 2025-08-14T23:45:59.7502460Z Artifact name is valid! 2025-08-14T23:45:59.7503077Z Root directory input is valid! 2025-08-14T23:46:01.1489717Z Beginning upload of artifact content to blob storage 2025-08-14T23:46:01.7447372Z Uploaded bytes 45828 2025-08-14T23:46:01.8669720Z Finished uploading artifact content to blob storage! 2025-08-14T23:46:01.8675319Z SHA256 digest of uploaded artifact zip is 4e23e58bcd35ccdd587455270629905620e74036a231d33670d4b4b9c95d53ea 2025-08-14T23:46:01.8678048Z Finalizing artifact upload 2025-08-14T23:46:02.0107796Z Artifact test-jsons-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip.zip successfully finalized. Artifact ID 3770227283 2025-08-14T23:46:02.0110027Z Artifact test-jsons-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip has been successfully uploaded! Final size is 45828 bytes. Artifact ID is 3770227283 2025-08-14T23:46:02.0120985Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/16976255019/artifacts/3770227283 2025-08-14T23:46:02.0442914Z ##[group]Run actions/upload-artifact@v4 2025-08-14T23:46:02.0443484Z with: 2025-08-14T23:46:02.0444222Z name: test-reports-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip 2025-08-14T23:46:02.0445369Z retention-days: 14 2025-08-14T23:46:02.0445840Z if-no-files-found: ignore 2025-08-14T23:46:02.0446363Z path: test/**/*.xml test/**/*.csv 2025-08-14T23:46:02.0446900Z compression-level: 6 2025-08-14T23:46:02.0447341Z overwrite: false 2025-08-14T23:46:02.0447789Z include-hidden-files: false 2025-08-14T23:46:02.0448264Z env: 2025-08-14T23:46:02.0448647Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:02.0449390Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:02.0450431Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:02.0451633Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:02.0453298Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:02.0454815Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:02.0455338Z AWS_REGION: us-east-1 2025-08-14T23:46:02.0455921Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:02.0456604Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:02.0467216Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:02.0467993Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:02.0468830Z ##[endgroup] 2025-08-14T23:46:02.8144355Z With the provided path, there will be 144 files uploaded 2025-08-14T23:46:02.8149479Z Artifact name is valid! 2025-08-14T23:46:02.8150078Z Root directory input is valid! 2025-08-14T23:46:04.3968923Z Beginning upload of artifact content to blob storage 2025-08-14T23:46:05.5174020Z Uploaded bytes 679622 2025-08-14T23:46:05.6032233Z Finished uploading artifact content to blob storage! 2025-08-14T23:46:05.6037941Z SHA256 digest of uploaded artifact zip is 216fc0639e6194242fca9e8b51606cf57fe97ada0ce2c8be311fae08c100baf7 2025-08-14T23:46:05.6040743Z Finalizing artifact upload 2025-08-14T23:46:05.7413925Z Artifact test-reports-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip.zip successfully finalized. Artifact ID 3770227583 2025-08-14T23:46:05.7416257Z Artifact test-reports-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip has been successfully uploaded! Final size is 679622 bytes. Artifact ID is 3770227583 2025-08-14T23:46:05.7427278Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/16976255019/artifacts/3770227583 2025-08-14T23:46:05.7774634Z ##[group]Run actions/upload-artifact@v4 2025-08-14T23:46:05.7775218Z with: 2025-08-14T23:46:05.7775894Z name: logs-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip 2025-08-14T23:46:05.7776735Z retention-days: 14 2025-08-14T23:46:05.7777204Z if-no-files-found: ignore 2025-08-14T23:46:05.7777736Z path: usage_log.txt test/**/*.log 2025-08-14T23:46:05.7778284Z compression-level: 6 2025-08-14T23:46:05.7778737Z overwrite: false 2025-08-14T23:46:05.7779191Z include-hidden-files: false 2025-08-14T23:46:05.7779694Z env: 2025-08-14T23:46:05.7780078Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:05.7780813Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:05.7781859Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:05.7782834Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:05.7784916Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:05.7786477Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:05.7786984Z AWS_REGION: us-east-1 2025-08-14T23:46:05.7787548Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:05.7788242Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:05.7798546Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:05.7799317Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:05.7800516Z ##[endgroup] 2025-08-14T23:46:06.7230813Z Multiple search paths detected. Calculating the least common ancestor of all paths 2025-08-14T23:46:06.7233973Z The least common ancestor is /home/pytorchci/actions-runner/_work/pytorch/pytorch. This will be the root directory of the artifact 2025-08-14T23:46:06.7235298Z With the provided path, there will be 144 files uploaded 2025-08-14T23:46:06.7242668Z Artifact name is valid! 2025-08-14T23:46:06.7243663Z Root directory input is valid! 2025-08-14T23:46:08.2969035Z Beginning upload of artifact content to blob storage 2025-08-14T23:46:09.5766422Z Uploaded bytes 912632 2025-08-14T23:46:09.6658892Z Finished uploading artifact content to blob storage! 2025-08-14T23:46:09.6664480Z SHA256 digest of uploaded artifact zip is e2724fa6e4026cbc1c100f61c00bfc067085f998941927316737ea3788a1650a 2025-08-14T23:46:09.6667252Z Finalizing artifact upload 2025-08-14T23:46:09.8178821Z Artifact logs-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip.zip successfully finalized. Artifact ID 3770227924 2025-08-14T23:46:09.8180966Z Artifact logs-runattempt1-test-default-2-6-linux.rocm.gpu.2_48127805673.zip has been successfully uploaded! Final size is 912632 bytes. Artifact ID is 3770227924 2025-08-14T23:46:09.8192123Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/16976255019/artifacts/3770227924 2025-08-14T23:46:09.8522426Z ##[group]Run # shellcheck disable=SC2156 2025-08-14T23:46:09.8523164Z # shellcheck disable=SC2156 2025-08-14T23:46:09.8524206Z find . -iname "core.[1-9]*" -exec docker exec "${CONTAINER_NAME}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-08-14T23:46:09.8575395Z shell: /usr/bin/bash -e {0} 2025-08-14T23:46:09.8575940Z env: 2025-08-14T23:46:09.8576359Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:09.8577127Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:09.8578251Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:09.8579294Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:09.8580997Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:09.8582578Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:09.8583111Z AWS_REGION: us-east-1 2025-08-14T23:46:09.8583703Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:09.8584400Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:09.8594718Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:09.8595504Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:09.8596335Z ##[endgroup] 2025-08-14T23:46:10.2635561Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 2025-08-14T23:46:10.2636579Z with: 2025-08-14T23:46:10.2637329Z role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_upload-benchmark-results 2025-08-14T23:46:10.2638261Z role-duration-seconds: 18000 2025-08-14T23:46:10.2638814Z aws-region: us-east-1 2025-08-14T23:46:10.2639335Z audience: sts.amazonaws.com 2025-08-14T23:46:10.2639853Z env: 2025-08-14T23:46:10.2640267Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:10.2641169Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:10.2642229Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:10.2643241Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:10.2645029Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:10.2646576Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:10.2647100Z AWS_REGION: us-east-1 2025-08-14T23:46:10.2647715Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:10.2648432Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:10.2659150Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:10.2659963Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:10.2660828Z ##[endgroup] 2025-08-14T23:46:10.6713086Z Assuming role with OIDC 2025-08-14T23:46:11.0961875Z Authenticated as assumedRoleId AROAUPVRELQNA5GQHA6IA:GitHubActions 2025-08-14T23:46:11.2194005Z ##[group]Run pytorch/test-infra/.github/actions/upload-benchmark-results@main 2025-08-14T23:46:11.2194929Z with: 2025-08-14T23:46:11.2195437Z benchmark-results-dir: test/test-reports 2025-08-14T23:46:11.2196052Z dry-run: false 2025-08-14T23:46:11.2196775Z schema-version: v3 2025-08-14T23:46:11.2197722Z github-token: *** 2025-08-14T23:46:11.2198193Z env: 2025-08-14T23:46:11.2198629Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:11.2199409Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:11.2200666Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:11.2201723Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:11.2203449Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:11.2205082Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:11.2205643Z AWS_REGION: us-east-1 2025-08-14T23:46:11.2206221Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:11.2206940Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:11.2218023Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:11.2218832Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:11.2219697Z ##[endgroup] 2025-08-14T23:46:11.2255351Z ##[group]Run set -eux 2025-08-14T23:46:11.2255938Z set -eux 2025-08-14T23:46:11.2256638Z python3 -mpip install boto3==1.35.33 psutil==7.0.0 pynvml==12.0.0 2025-08-14T23:46:11.2257450Z  2025-08-14T23:46:11.2257903Z DEVICE_NAME="" 2025-08-14T23:46:11.2258435Z DEVICE_TYPE="" 2025-08-14T23:46:11.2258908Z  2025-08-14T23:46:11.2259391Z if command -v nvidia-smi; then 2025-08-14T23:46:11.2260255Z  # NB: I'm using PyTorch here to get the device name, however, it needs to 2025-08-14T23:46:11.2261309Z  # install the correct version of PyTorch manually for now. Any PyTorch 2025-08-14T23:46:11.2262296Z  # version is fine, I just use 2.7.1 to satify PYPIDEP linter 2025-08-14T23:46:11.2263106Z  python3 -mpip install torch==2.7.1 2025-08-14T23:46:11.2263774Z elif command -v rocminfo; then 2025-08-14T23:46:11.2264598Z  # NB: Installing torch on ROCm runner with pip here causes CI to fail 2025-08-14T23:46:11.2265615Z  # with a memoryview is too large error only on MI300 runners. Is pip 2025-08-14T23:46:11.2266620Z  # version on ROCm runner there too old? As a workaround, let's use the 2025-08-14T23:46:11.2267545Z  # GPU device name coming from rocminfo instead 2025-08-14T23:46:11.2268233Z  DEVICE_NAME=rocm 2025-08-14T23:46:11.2269148Z  DEVICE_TYPE=$(rocminfo | grep "Marketing Name" | tail -n1 | awk -F':' '{print $2}' | xargs) 2025-08-14T23:46:11.2270100Z fi 2025-08-14T23:46:11.2270537Z  2025-08-14T23:46:11.2271074Z echo "DEVICE_NAME=$DEVICE_NAME" >> $GITHUB_ENV 2025-08-14T23:46:11.2271847Z echo "DEVICE_TYPE=$DEVICE_TYPE" >> $GITHUB_ENV 2025-08-14T23:46:11.2325826Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:11.2326564Z env: 2025-08-14T23:46:11.2327007Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:11.2327804Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:11.2328936Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:11.2329981Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:11.2331986Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:11.2333559Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:11.2334112Z AWS_REGION: us-east-1 2025-08-14T23:46:11.2335104Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:11.2335866Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:11.2346886Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:11.2347709Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:11.2348775Z ##[endgroup] 2025-08-14T23:46:11.2430648Z + python3 -mpip install boto3==1.35.33 psutil==7.0.0 pynvml==12.0.0 2025-08-14T23:46:11.5079044Z Defaulting to user installation because normal site-packages is not writeable 2025-08-14T23:46:11.6011271Z Requirement already satisfied: boto3==1.35.33 in /home/pytorchci/.local/lib/python3.10/site-packages (1.35.33) 2025-08-14T23:46:11.6015591Z Requirement already satisfied: psutil==7.0.0 in /home/pytorchci/.local/lib/python3.10/site-packages (7.0.0) 2025-08-14T23:46:11.6024915Z Requirement already satisfied: pynvml==12.0.0 in /home/pytorchci/.local/lib/python3.10/site-packages (12.0.0) 2025-08-14T23:46:11.6059632Z Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /home/pytorchci/.local/lib/python3.10/site-packages (from boto3==1.35.33) (0.10.4) 2025-08-14T23:46:11.6064075Z Requirement already satisfied: botocore<1.36.0,>=1.35.33 in /home/pytorchci/.local/lib/python3.10/site-packages (from boto3==1.35.33) (1.35.99) 2025-08-14T23:46:11.6068179Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/pytorchci/.local/lib/python3.10/site-packages (from boto3==1.35.33) (1.0.1) 2025-08-14T23:46:11.6222198Z Requirement already satisfied: nvidia-ml-py<13.0.0a0,>=12.0.0 in /home/pytorchci/.local/lib/python3.10/site-packages (from pynvml==12.0.0) (12.575.51) 2025-08-14T23:46:11.6269438Z Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (1.26.5) 2025-08-14T23:46:11.6276473Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/pytorchci/.local/lib/python3.10/site-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (2.9.0.post0) 2025-08-14T23:46:11.6319723Z Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.33->boto3==1.35.33) (1.16.0) 2025-08-14T23:46:11.8892585Z + DEVICE_NAME= 2025-08-14T23:46:11.8893130Z + DEVICE_TYPE= 2025-08-14T23:46:11.8893669Z + command -v nvidia-smi 2025-08-14T23:46:11.8894209Z + command -v rocminfo 2025-08-14T23:46:11.8894709Z + DEVICE_NAME=rocm 2025-08-14T23:46:11.8895171Z /usr/bin/rocminfo 2025-08-14T23:46:11.8906759Z ++ rocminfo 2025-08-14T23:46:11.8909426Z ++ grep 'Marketing Name' 2025-08-14T23:46:11.8911300Z ++ tail -n1 2025-08-14T23:46:11.8912981Z ++ awk -F: '{print $2}' 2025-08-14T23:46:11.8917545Z ++ xargs 2025-08-14T23:46:11.9996545Z + DEVICE_TYPE='AMD Instinct MI210' 2025-08-14T23:46:11.9997158Z + echo DEVICE_NAME=rocm 2025-08-14T23:46:11.9997667Z + echo 'DEVICE_TYPE=AMD Instinct MI210' 2025-08-14T23:46:12.0064184Z ##[group]Run set -eux 2025-08-14T23:46:12.0064500Z set -eux 2025-08-14T23:46:12.0064745Z  2025-08-14T23:46:12.0065033Z if [[ -z "${GITHUB_TOKEN}" ]]; then 2025-08-14T23:46:12.0065434Z  echo "Missing github-token input" 2025-08-14T23:46:12.0065768Z  exit 1 2025-08-14T23:46:12.0066032Z fi 2025-08-14T23:46:12.0118950Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:12.0119742Z env: 2025-08-14T23:46:12.0120235Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:12.0121189Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:12.0122405Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:12.0123517Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:12.0125653Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:12.0127313Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:12.0128399Z AWS_REGION: us-east-1 2025-08-14T23:46:12.0129204Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:12.0130130Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:12.0141747Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:12.0142651Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:12.0143688Z DEVICE_NAME: rocm 2025-08-14T23:46:12.0144190Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:12.0145032Z GITHUB_TOKEN: *** 2025-08-14T23:46:12.0145538Z ##[endgroup] 2025-08-14T23:46:12.0233053Z + [[ -z *** ]] 2025-08-14T23:46:12.0318071Z ##[group]Run pytorch/test-infra/.github/actions/get-workflow-job-id@main 2025-08-14T23:46:12.0318920Z with: 2025-08-14T23:46:12.0319620Z github-token: *** 2025-08-14T23:46:12.0320098Z env: 2025-08-14T23:46:12.0320693Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:12.0321473Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:12.0322614Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:12.0323654Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:12.0325373Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:12.0326935Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:12.0327603Z AWS_REGION: us-east-1 2025-08-14T23:46:12.0328291Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:12.0329152Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:12.0340749Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:12.0341669Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:12.0342541Z DEVICE_NAME: rocm 2025-08-14T23:46:12.0343023Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:12.0343551Z ##[endgroup] 2025-08-14T23:46:12.0374325Z ##[group]Run set -eux 2025-08-14T23:46:12.0374985Z set -eux 2025-08-14T23:46:12.0375555Z  2025-08-14T23:46:12.0376492Z python3 "${GITHUB_ACTION_PATH}/../../scripts/get_workflow_job_id.py" "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-08-14T23:46:12.0429772Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:12.0430553Z env: 2025-08-14T23:46:12.0431030Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:12.0431823Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:12.0432991Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:12.0434087Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:12.0435845Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:12.0437422Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:12.0437989Z AWS_REGION: us-east-1 2025-08-14T23:46:12.0438629Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:12.0439388Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:12.0450749Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:12.0451725Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:12.0452762Z DEVICE_NAME: rocm 2025-08-14T23:46:12.0453377Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:12.0454305Z GITHUB_TOKEN: *** 2025-08-14T23:46:12.0454865Z ##[endgroup] 2025-08-14T23:46:12.0535961Z + python3 /home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/get-workflow-job-id/../../scripts/get_workflow_job_id.py 16976255019 pytorch-rocm-hw-43 2025-08-14T23:46:12.8588418Z setting job-id=48127805673 2025-08-14T23:46:12.8589415Z setting job-name=linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T23:46:12.8772659Z ##[group]Run set -eux 2025-08-14T23:46:12.8773157Z set -eux 2025-08-14T23:46:12.8773583Z  2025-08-14T23:46:12.8774644Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_metadata.py" \ 2025-08-14T23:46:12.8775562Z  --schema-version "${SCHEMA_VERSION}" \ 2025-08-14T23:46:12.8776200Z  --repo "${REPO}" \ 2025-08-14T23:46:12.8777026Z  --head-branch "${HEAD_BRANCH}" \ 2025-08-14T23:46:12.8777687Z  --head-sha "${HEAD_SHA}" \ 2025-08-14T23:46:12.8778337Z  --workflow-id "${WORKFLOW_RUN_ID}" \ 2025-08-14T23:46:12.8779003Z  --run-attempt "${RUN_ATTEMPT}" \ 2025-08-14T23:46:12.8779633Z  --job-id "${JOB_ID}" \ 2025-08-14T23:46:12.8780226Z  --job-name "${JOB_NAME}" 2025-08-14T23:46:12.8827942Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:12.8828675Z env: 2025-08-14T23:46:12.8829129Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:12.8829942Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:12.8831112Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:12.8832183Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:12.8833938Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:12.8835567Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:12.8836135Z AWS_REGION: us-east-1 2025-08-14T23:46:12.8836785Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:12.8837546Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:12.8848705Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:12.8849476Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:12.8850337Z DEVICE_NAME: rocm 2025-08-14T23:46:12.8850810Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:12.8851330Z SCHEMA_VERSION: v3 2025-08-14T23:46:12.8851798Z REPO: pytorch/pytorch 2025-08-14T23:46:12.8852283Z HEAD_BRANCH: refs/heads/main 2025-08-14T23:46:12.8852869Z HEAD_SHA: 1fc683cf17c8c673044538d10266c00f92987be2 2025-08-14T23:46:12.8853484Z WORKFLOW_RUN_ID: 16976255019 2025-08-14T23:46:12.8853976Z RUN_ATTEMPT: 1 2025-08-14T23:46:12.8854402Z JOB_ID: 48127805673 2025-08-14T23:46:12.8855097Z JOB_NAME: linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) 2025-08-14T23:46:12.8855857Z ##[endgroup] 2025-08-14T23:46:12.8937796Z + python3 /home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_metadata.py --schema-version v3 --repo pytorch/pytorch --head-branch refs/heads/main --head-sha 1fc683cf17c8c673044538d10266c00f92987be2 --workflow-id 16976255019 --run-attempt 1 --job-id 48127805673 --job-name 'linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2)' 2025-08-14T23:46:12.9319142Z ##[group]Run set -eux 2025-08-14T23:46:12.9319686Z set -eux 2025-08-14T23:46:12.9320148Z  2025-08-14T23:46:12.9321179Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_runners_info.py" 2025-08-14T23:46:12.9369731Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:12.9370609Z env: 2025-08-14T23:46:12.9371156Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:12.9372157Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:12.9373317Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:12.9374338Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:12.9375998Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:12.9377772Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:12.9378327Z AWS_REGION: us-east-1 2025-08-14T23:46:12.9378932Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:12.9379962Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:12.9391475Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:12.9392366Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:12.9393292Z DEVICE_NAME: rocm 2025-08-14T23:46:12.9394007Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:12.9394602Z ##[endgroup] 2025-08-14T23:46:12.9465130Z + python3 /home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_runners_info.py 2025-08-14T23:46:13.6533789Z /home/pytorchci/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) 2025-08-14T23:46:13.6535036Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-08-14T23:46:14.1968061Z ##[group]Run set -eux 2025-08-14T23:46:14.1968717Z set -eux 2025-08-14T23:46:14.1969233Z  2025-08-14T23:46:14.1969863Z # TODO (huydhn): Implement this part 2025-08-14T23:46:14.1970733Z echo "dependencies={}" >> "${GITHUB_OUTPUT}" 2025-08-14T23:46:14.2022785Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:14.2023527Z env: 2025-08-14T23:46:14.2023975Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:14.2024776Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:14.2025918Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:14.2027005Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:14.2028796Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:14.2030408Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:14.2030989Z AWS_REGION: us-east-1 2025-08-14T23:46:14.2031622Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:14.2032366Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:14.2043574Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:14.2044402Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:14.2045280Z DEVICE_NAME: rocm 2025-08-14T23:46:14.2045803Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:14.2046411Z ##[endgroup] 2025-08-14T23:46:14.2133216Z + echo 'dependencies={}' 2025-08-14T23:46:14.2174992Z ##[group]Run set -eux 2025-08-14T23:46:14.2175311Z set -eux 2025-08-14T23:46:14.2175568Z  2025-08-14T23:46:14.2175868Z if [[ ! -d "${BENCHMARK_RESULTS_DIR}" ]]; then 2025-08-14T23:46:14.2176320Z  echo "${BENCHMARK_RESULTS_DIR} does not exist, skipping" 2025-08-14T23:46:14.2176803Z  # We don't want the job to fail if the directory doesn't exist 2025-08-14T23:46:14.2177197Z  exit 0 2025-08-14T23:46:14.2177439Z fi 2025-08-14T23:46:14.2177677Z  2025-08-14T23:46:14.2177947Z if [[ "${DRY_RUN}" == "true" ]]; then 2025-08-14T23:46:14.2178496Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-08-14T23:46:14.2179586Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-08-14T23:46:14.2180534Z  --metadata "${BENCHMARK_METADATA}" \ 2025-08-14T23:46:14.2181335Z  --runners "${RUNNER_INFO}" \ 2025-08-14T23:46:14.2182014Z  --dependencies "${DEPENDENCIES}" \ 2025-08-14T23:46:14.2182634Z  --dry-run 2025-08-14T23:46:14.2183133Z else 2025-08-14T23:46:14.2183859Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-08-14T23:46:14.2185198Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-08-14T23:46:14.2185996Z  --metadata "${BENCHMARK_METADATA}" \ 2025-08-14T23:46:14.2186647Z  --runners "${RUNNER_INFO}" \ 2025-08-14T23:46:14.2187671Z  --dependencies "${DEPENDENCIES}" 2025-08-14T23:46:14.2188311Z fi 2025-08-14T23:46:14.2238583Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:14.2239324Z env: 2025-08-14T23:46:14.2239762Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:14.2240906Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:14.2242049Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:14.2243106Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:14.2244854Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:14.2246506Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:14.2247177Z AWS_REGION: us-east-1 2025-08-14T23:46:14.2247943Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:14.2248848Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:14.2256628Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:14.2257057Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:14.2257525Z DEVICE_NAME: rocm 2025-08-14T23:46:14.2257812Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:14.2258158Z BENCHMARK_RESULTS_DIR: test/test-reports 2025-08-14T23:46:14.2258523Z DRY_RUN: false 2025-08-14T23:46:14.2260741Z BENCHMARK_METADATA: {"timestamp": 1755215172, "schema_version": "v3", "name": "linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "1fc683cf17c8c673044538d10266c00f92987be2", "workflow_id": 16976255019, "run_attempt": 1, "job_id": 48127805673} 2025-08-14T23:46:14.2263746Z RUNNER_INFO: [{"cpu_info": "x86_64", "cpu_count": 64, "avail_mem_in_gb": 125, "extra_info": {"hostname": "pytorch-rocm-hw-43"}, "name": "rocm", "type": "AMD Instinct MI210"}] 2025-08-14T23:46:14.2264977Z DEPENDENCIES: {} 2025-08-14T23:46:14.2265464Z ##[endgroup] 2025-08-14T23:46:14.2339336Z + [[ ! -d test/test-reports ]] 2025-08-14T23:46:14.2340084Z + [[ false == \t\r\u\e ]] 2025-08-14T23:46:14.2345036Z + python3 /home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py --benchmark-results-dir test/test-reports --metadata '{"timestamp": 1755215172, "schema_version": "v3", "name": "linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "1fc683cf17c8c673044538d10266c00f92987be2", "workflow_id": 16976255019, "run_attempt": 1, "job_id": 48127805673}' --runners '[{"cpu_info": "x86_64", "cpu_count": 64, "avail_mem_in_gb": 125, "extra_info": {"hostname": "pytorch-rocm-hw-43"}, "name": "rocm", "type": "AMD Instinct MI210"}]' --dependencies '{}' 2025-08-14T23:46:14.3743670Z /home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py:236: UserWarning: {'included': [{'test_file': 'cpp/test_jit'}, {'test_file': 'cpp/test_lazy'}], 'excluded': []} from test/test-reports/td_exclusions-0aa5582532a68b4694ca.json is not a benchmark record, skipping 2025-08-14T23:46:14.3746747Z warn(f"{result} from {filepath} is not a benchmark record, skipping") 2025-08-14T23:46:14.3863132Z /home/pytorchci/actions-runner/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py:236: UserWarning: {'included': [{'test_file': 'test_public_bindings'}, {'test_file': 'inductor/test_aot_inductor'}, {'test_file': 'inductor/test_torchinductor'}, {'test_file': 'inductor/test_max_autotune'}, {'test_file': 'inductor/test_torchinductor_dynamic_shapes'}, {'test_file': 'inductor/test_torchinductor_codegen_dynamic_shapes'}, {'test_file': 'inductor/test_torchinductor_opinfo'}, {'test_file': 'inductor/test_cpu_repro'}, {'test_file': 'inductor/test_cuda_repro'}, {'test_file': 'dynamo/test_unspec'}, {'test_file': 'dynamo/test_repros'}, {'test_file': 'inductor/test_kernel_benchmark'}, {'test_file': 'dynamo/test_dynamic_shapes'}, {'test_file': 'inductor/test_cudagraph_trees'}, {'test_file': 'inductor/test_mkldnn_pattern_matcher'}, {'test_file': 'inductor/test_perf'}, {'test_file': 'inductor/test_pattern_matcher'}, {'test_file': 'inductor/test_fused_attention'}, {'test_file': 'inductor/test_inductor_freezing'}, {'test_file': 'dynamo/test_misc'}, {'test_file': 'dynamo/test_higher_order_ops'}, {'test_file': 'dynamo/test_modules'}, {'test_file': 'dynamo/test_backends'}, {'test_file': 'dynamo/test_activation_checkpointing'}, {'test_file': 'dynamo/test_logging'}, {'test_file': 'inductor/test_select_algorithm'}, {'test_file': 'inductor/test_dependencies'}, {'test_file': 'inductor/test_compiled_optimizers'}, {'test_file': 'inductor/test_compiled_autograd'}, {'test_file': 'dynamo/test_ctx_manager'}, {'test_file': 'inductor/test_snode_runtime'}, {'test_file': 'inductor/test_coordinate_descent_tuner'}, {'test_file': 'inductor/test_foreach'}, {'test_file': 'inductor/test_extension_backend'}, {'test_file': 'dynamo/test_after_aot'}, {'test_file': 'dynamo/test_aot_autograd'}, {'test_file': 'dynamo/test_cudagraphs'}, {'test_file': 'dynamo/test_exc'}, {'test_file': 'inductor/test_binary_folding'}, {'test_file': 'inductor/test_config'}, {'test_file': 'inductor/test_custom_lowering'}, {'test_file': 'inductor/test_group_batch_fusion'}, {'test_file': 'inductor/test_layout_optim'}, {'test_file': 'inductor/test_minifier'}, {'test_file': 'inductor/test_mmdecomp'}, {'test_file': 'inductor/test_smoke'}, {'test_file': 'inductor/test_split_cat_fx_passes'}, {'test_file': 'inductor/test_triton_wrapper'}, {'test_file': 'inductor/test_flex_attention'}, {'test_file': 'inductor/test_control_flow'}, {'test_file': 'inductor/test_padding'}, {'test_file': 'inductor/test_aot_inductor_arrayref'}, {'test_file': 'inductor/test_halide'}, {'test_file': 'inductor/test_unbacked_symints'}, {'test_file': 'inductor/test_triton_kernels'}, {'test_file': 'inductor/test_torchinductor_strided_blocks'}, {'test_file': 'inductor/test_cpu_select_algorithm'}, {'test_file': 'inductor/test_aot_inductor_custom_ops'}, {'test_file': 'inductor/test_triton_cpu_backend'}, {'test_file': 'inductor/test_alignment'}, {'test_file': 'inductor/test_flex_decoding'}, {'test_file': 'inductor/test_torchbind'}, {'test_file': 'export/test_export'}, {'test_file': 'inductor/test_memory'}, {'test_file': 'inductor/test_benchmark_fusion'}, {'test_file': 'inductor/test_multi_kernel'}, {'test_file': 'inductor/test_inplace_padding'}, {'test_file': 'dynamo/test_functions'}, {'test_file': 'inductor/test_provenance_tracing'}, {'test_file': 'inductor/test_online_softmax'}, {'test_file': 'inductor/test_subgraph_choice'}, {'test_file': 'export/test_torchbind'}, {'test_file': 'inductor/test_cutlass_backend'}, {'test_file': 'dynamo/test_einops'}, {'test_file': 'inductor/test_external_callables'}, {'test_file': 'inductor/test_memory_planning'}, {'test_file': 'inductor/test_loop_ordering'}, {'test_file': 'inductor/test_fp8'}, {'test_file': 'inductor/test_combo_kernels'}, {'test_file': 'inductor/test_cpu_cpp_wrapper'}, {'test_file': 'inductor/test_fxir_backend'}, {'test_file': 'functorch/test_eager_transforms'}, {'test_file': 'inductor/test_cooperative_reductions'}, {'test_file': 'dynamo/test_decorators'}, {'test_file': 'inductor/test_triton_syntax'}, {'test_file': 'inductor/test_codecache'}, {'test_file': 'inductor/test_debug_trace'}, {'test_file': 'inductor/test_op_dtype_prop'}, {'test_file': 'inductor/test_pad_mm'}, {'test_file': 'export/test_nativert'}, {'test_file': 'test_custom_ops'}, {'test_file': 'inductor/test_triton_heuristics'}, {'test_file': 'inductor/test_fuzzer'}, {'test_file': 'dynamo/test_autograd_function'}, {'test_file': 'inductor/test_cpp_wrapper_hipify'}, {'test_file': 'inductor/test_profiler'}, {'test_file': 'export/test_serdes'}, {'test_file': 'inductor/test_ck_backend'}, {'test_file': 'dynamo/test_graph_deduplication'}, {'test_file': 'export/test_serialize'}, {'test_file': 'inductor/test_mps_basic'}, {'test_file': 'inductor/test_compile_subprocess'}, {'test_file': 'test_testing'}, {'test_file': 'inductor/test_cutlass_evt'}, {'test_file': 'export/test_retraceability'}, {'test_file': 'test_content_store'}, {'test_file': 'export/test_cpp_serdes'}, {'test_file': 'export/test_export_training_ir_to_run_decomp'}, {'test_file': 'inductor/test_aot_inductor_package'}, {'test_file': 'inductor/test_analysis'}, {'test_file': 'export/test_unflatten'}, {'test_file': 'dynamo/test_interop'}, {'test_file': 'inductor/test_quantization'}, {'test_file': 'dynamo/test_fake_distributed'}, {'test_file': 'inductor/test_gpu_cpp_wrapper'}, {'test_file': 'dynamo/test_export'}, {'test_file': 'dynamo/test_subclasses'}, {'test_file': 'export/test_export_strict'}, {'test_file': 'export/test_export_with_inline_and_install'}, {'test_file': 'inductor/test_compile_worker'}, {'test_file': 'export/test_unflatten_training_ir'}, {'test_file': 'test_model_exports_to_core_aten'}, {'test_file': 'test_quantization'}, {'test_file': 'inductor/test_async_compile'}, {'test_file': 'inductor/test_static_cuda_launcher'}, {'test_file': 'dynamo/test_error_messages'}, {'test_file': 'dynamo/test_fx_graph_runnable'}, {'test_file': 'inductor/test_remote_cache'}, {'test_file': 'dynamo/test_aot_autograd_cache'}, {'test_file': 'dynamo/test_backward_higher_order_ops'}, {'test_file': 'dynamo/test_base_hop'}, {'test_file': 'dynamo/test_base_output'}, {'test_file': 'dynamo/test_buffers_override'}, {'test_file': 'dynamo/test_bytecode_utils'}, {'test_file': 'dynamo/test_callback'}, {'test_file': 'dynamo/test_compile'}, {'test_file': 'dynamo/test_compiler_bisector'}, {'test_file': 'dynamo/test_comptime'}, {'test_file': 'dynamo/test_config'}, {'test_file': 'dynamo/test_cudagraphs_expandable_segments'}, {'test_file': 'dynamo/test_debug_utils'}, {'test_file': 'dynamo/test_deque_reconstruct'}, {'test_file': 'dynamo/test_deviceguard'}, {'test_file': 'dynamo/test_dicts'}, {'test_file': 'dynamo/test_exceptions'}, {'test_file': 'dynamo/test_export_mutations'}, {'test_file': 'dynamo/test_flat_apply'}, {'test_file': 'dynamo/test_frame_init'}, {'test_file': 'dynamo/test_fx_passes_pre_grad'}, {'test_file': 'dynamo/test_generator'}, {'test_file': 'dynamo/test_global'}, {'test_file': 'dynamo/test_graph_region_tracker'}, {'test_file': 'dynamo/test_guard_manager'}, {'test_file': 'dynamo/test_guard_serialization'}, {'test_file': 'dynamo/test_hooks'}, {'test_file': 'dynamo/test_inline_and_install'}, {'test_file': 'dynamo/test_input_attr_tracking'}, {'test_file': 'dynamo/test_install_free_tensors'}, {'test_file': 'dynamo/test_list'}, {'test_file': 'dynamo/test_metrics_context'}, {'test_file': 'dynamo/test_minifier'}, {'test_file': 'dynamo/test_model_output'}, {'test_file': 'dynamo/test_modes'}, {'test_file': 'dynamo/test_nops'}, {'test_file': 'dynamo/test_optimizers'}, {'test_file': 'dynamo/test_package'}, {'test_file': 'dynamo/test_pgo'}, {'test_file': 'dynamo/test_pre_dispatch'}, {'test_file': 'dynamo/test_precompile_context'}, {'test_file': 'dynamo/test_profiler'}, {'test_file': 'dynamo/test_python_autograd'}, {'test_file': 'dynamo/test_python_dispatcher'}, {'test_file': 'dynamo/test_recompile_ux'}, {'test_file': 'dynamo/test_recompiles'}, {'test_file': 'dynamo/test_reconstruct'}, {'test_file': 'dynamo/test_reorder_logs'}, {'test_file': 'dynamo/test_resume'}, {'test_file': 'dynamo/test_sdpa'}, {'test_file': 'dynamo/test_sets'}, {'test_file': 'dynamo/test_skip_guard_eval_unsafe'}, {'test_file': 'dynamo/test_skip_non_tensor'}, {'test_file': 'dynamo/test_sources'}, {'test_file': 'dynamo/test_structured_trace'}, {'test_file': 'dynamo/test_subgraphs'}, {'test_file': 'dynamo/test_torchrec'}, {'test_file': 'dynamo/test_trace_rules'}, {'test_file': 'dynamo/test_unittest'}, {'test_file': 'dynamo/test_utils'}, {'test_file': 'dynamo/test_verify_correctness'}, {'test_file': 'dynamo/test_view'}, {'test_file': 'export/test_converter'}, {'test_file': 'export/test_db'}, {'test_file': 'export/test_draft_export'}, {'test_file': 'export/test_experimental'}, {'test_file': 'export/test_functionalized_assertions'}, {'test_file': 'export/test_hop'}, {'test_file': 'export/test_lift_unlift'}, {'test_file': 'export/test_package'}, {'test_file': 'export/test_pass_infra'}, {'test_file': 'export/test_passes'}, {'test_file': 'export/test_schema'}, {'test_file': 'export/test_sparse'}, {'test_file': 'export/test_swap'}, {'test_file': 'export/test_tools'}, {'test_file': 'export/test_tree_utils'}, {'test_file': 'export/test_upgrader'}, {'test_file': 'export/test_verifier'}, {'test_file': 'inductor/test_aot_inductor_utils'}, {'test_file': 'inductor/test_auto_functionalize'}, {'test_file': 'inductor/test_autoheuristic'}, {'test_file': 'inductor/test_b2b_gemm'}, {'test_file': 'inductor/test_benchmarking'}, {'test_file': 'inductor/test_best_config'}, {'test_file': 'inductor/test_block_analysis'}, {'test_file': 'inductor/test_codegen_triton'}, {'test_file': 'inductor/test_compile'}, {'test_file': 'inductor/test_cudacodecache'}, {'test_file': 'inductor/test_cudagraph_trees_expandable_segments'}, {'test_file': 'inductor/test_custom_post_grad_passes'}, {'test_file': 'inductor/test_decompose_mem_bound_mm'}, {'test_file': 'inductor/test_distributed_patterns'}, {'test_file': 'inductor/test_efficient_conv_bn_eval'}, {'test_file': 'inductor/test_fx_fusion'}, {'test_file': 'inductor/test_graph_transform_observer'}, {'test_file': 'inductor/test_helion_kernels'}, {'test_file': 'inductor/test_indexing'}, {'test_file': 'inductor/test_inductor_annotations'}, {'test_file': 'inductor/test_inductor_scheduler'}, {'test_file': 'inductor/test_inductor_utils'}, {'test_file': 'inductor/test_inplacing_pass'}, {'test_file': 'inductor/test_kernel_optimization'}, {'test_file': 'inductor/test_metrics'}, {'test_file': 'inductor/test_minifier_isolate'}, {'test_file': 'inductor/test_minifier_utils'}, {'test_file': 'inductor/test_move_constructors_to_cuda'}, {'test_file': 'inductor/test_needs_exact_strides'}, {'test_file': 'inductor/test_op_completeness'}, {'test_file': 'inductor/test_ordered_set'}, {'test_file': 'inductor/test_scatter_optimization'}, {'test_file': 'inductor/test_split_cat_fx_aten_passes'}, {'test_file': 'inductor/test_torchinductor_codegen_config_overrides'}, {'test_file': 'inductor/test_triton_extension_backend'}, {'test_file': 'inductor/test_utils'}, {'test_file': 'inductor/test_xpu_basic'}, {'test_file': 'test_functionalization_of_rng_ops'}, {'test_file': 'test_sparse_semi_structured'}, {'test_file': 'test_dynamic_shapes'}, {'test_file': 'higher_order_ops/test_invoke_subgraph'}, {'test_file': 'functorch/test_control_flow'}, {'test_file': 'test_torch'}, {'test_file': 'test_reductions'}, {'test_file': 'test_fake_tensor'}, {'test_file': 'test_ops'}, {'test_file': 'test_matmul_cuda'}, {'test_file': 'test_linalg'}, {'test_file': 'test_nestedtensor'}, {'test_file': 'test_modules'}, {'test_file': 'test_proxy_tensor'}, {'test_file': 'test_hop_infra'}, {'test_file': 'test_fx'}, {'test_file': 'test_foreach'}, {'test_file': 'functorch/test_aotdispatch'}, {'test_file': 'benchmark_utils/test_benchmark_utils'}, {'test_file': 'test_decomp'}, {'test_file': 'test_expanded_weights'}, {'test_file': 'distributions/test_distributions'}, {'test_file': 'doctests'}, {'test_file': 'test_cpp_api_parity'}, {'test_file': 'test_ops_gradients'}, {'test_file': 'profiler/test_cpp_thread'}, {'test_file': 'test_autoload_enable'}, {'test_file': 'test_nn'}, {'test_file': 'test_tensorboard'}, {'test_file': 'test_transformers_privateuse1'}, {'test_file': 'test_cpp_extensions_mtia_backend'}, {'test_file': 'test_autograd'}, {'test_file': 'test_autograd_fallback'}, {'test_file': 'test_jit'}, {'test_file': 'profiler/test_memory_profiler'}, {'test_file': 'functorch/test_dims'}, {'test_file': 'functorch/test_ops'}, {'test_file': 'nn/test_parametrization'}, {'test_file': 'profiler/test_kineto'}, {'test_file': 'profiler/test_profiler'}, {'test_file': 'test_ci_sanity_check_fail'}, {'test_file': 'test_cuda_multigpu'}, {'test_file': 'test_dataloader'}, {'test_file': 'test_jit_fuser_te'}, {'test_file': 'test_ops_jit'}, {'test_file': 'test_overrides'}, {'test_file': 'test_type_hints'}, {'test_file': 'test_sparse'}, {'test_file': 'functorch/test_ac_logging'}, {'test_file': 'test_numa_binding'}, {'test_file': 'test_cpp_extensions_aot_no_ninja'}, {'test_file': 'test_datapipe'}, {'test_file': 'backends/xeon/test_launch'}, {'test_file': 'cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic'}, {'test_file': 'cpp_extensions/python_agnostic_extension/test/test_python_agnostic'}, {'test_file': 'distributions/test_constraints'}, {'test_file': 'functorch/test_ac'}, {'test_file': 'functorch/test_ac_knapsack'}, {'test_file': 'functorch/test_aot_joint_with_descriptors'}, {'test_file': 'functorch/test_logging'}, {'test_file': 'functorch/test_memory_efficient_fusion'}, {'test_file': 'functorch/test_minifier'}, {'test_file': 'functorch/test_parsing'}, {'test_file': 'functorch/test_rearrange'}, {'test_file': 'functorch/test_vmap'}, {'test_file': 'functorch/test_vmap_registrations'}, {'test_file': 'higher_order_ops/test_invoke_quant'}, {'test_file': 'higher_order_ops/test_with_effects'}, {'test_file': 'lazy/test_bindings'}, {'test_file': 'lazy/test_debug_util'}, {'test_file': 'lazy/test_functionalization'}, {'test_file': 'lazy/test_generator'}, {'test_file': 'lazy/test_reuse_ir'}, {'test_file': 'lazy/test_step_closures'}, {'test_file': 'lazy/test_ts_opinfo'}, {'test_file': 'nn/test_convolution'}, {'test_file': 'nn/test_dropout'}, {'test_file': 'nn/test_embedding'}, {'test_file': 'nn/test_init'}, {'test_file': 'nn/test_lazy_modules'}, {'test_file': 'nn/test_load_state_dict'}, {'test_file': 'nn/test_module_hooks'}, {'test_file': 'nn/test_multihead_attention'}, {'test_file': 'nn/test_packed_sequence'}, {'test_file': 'nn/test_pooling'}, {'test_file': 'nn/test_pruning'}, {'test_file': 'optim/test_lrscheduler'}, {'test_file': 'optim/test_optim'}, {'test_file': 'optim/test_swa_utils'}, {'test_file': 'profiler/test_execution_trace'}, {'test_file': 'profiler/test_profiler_tree'}, {'test_file': 'profiler/test_python_tracer'}, {'test_file': 'profiler/test_record_function'}, {'test_file': 'profiler/test_torch_tidy'}, {'test_file': 'test_accelerator'}, {'test_file': 'test_ao_sparsity'}, {'test_file': 'test_appending_byte_serializer'}, {'test_file': 'test_autocast'}, {'test_file': 'test_autoload'}, {'test_file': 'test_autoload_disable'}, {'test_file': 'test_binary_ufuncs'}, {'test_file': 'test_bundled_inputs'}, {'test_file': 'test_comparison_utils'}, {'test_file': 'test_compile_benchmark_util'}, {'test_file': 'test_complex'}, {'test_file': 'test_cpp_extensions_aot_ninja'}, {'test_file': 'test_cpp_extensions_jit'}, {'test_file': 'test_cpp_extensions_stream_and_event'}, {'test_file': 'test_cuda'}, {'test_file': 'test_cuda_expandable_segments'}, {'test_file': 'test_cuda_primary_ctx'}, {'test_file': 'test_cuda_sanitizer'}, {'test_file': 'test_cuda_trace'}, {'test_file': 'test_dispatch'}, {'test_file': 'test_dlpack'}, {'test_file': 'test_extension_utils'}, {'test_file': 'test_file_check'}, {'test_file': 'test_flop_counter'}, {'test_file': 'test_function_schema'}, {'test_file': 'test_functional_autograd_benchmark'}, {'test_file': 'test_functional_optim'}, {'test_file': 'test_functionalization'}, {'test_file': 'test_futures'}, {'test_file': 'test_fx_experimental'}, {'test_file': 'test_fx_passes'}, {'test_file': 'test_fx_reinplace_pass'}, {'test_file': 'test_hub'}, {'test_file': 'test_import_stats'}, {'test_file': 'test_indexing'}, {'test_file': 'test_itt'}, {'test_file': 'test_jit_autocast'}, {'test_file': 'test_jit_disabled'}, {'test_file': 'test_jit_llga_fuser'}, {'test_file': 'test_jiterator'}, {'test_file': 'test_legacy_vmap'}, {'test_file': 'test_license'}, {'test_file': 'test_logging'}, {'test_file': 'test_masked'}, {'test_file': 'test_maskedtensor'}, {'test_file': 'test_meta'}, {'test_file': 'test_mkl_verbose'}, {'test_file': 'test_mkldnn'}, {'test_file': 'test_mkldnn_fusion'}, {'test_file': 'test_mkldnn_verbose'}, {'test_file': 'test_mobile_optimizer'}, {'test_file': 'test_module_tracker'}, {'test_file': 'test_monitor'}, {'test_file': 'test_multiprocessing'}, {'test_file': 'test_multiprocessing_spawn'}, {'test_file': 'test_namedtensor'}, {'test_file': 'test_namedtuple_return_api'}, {'test_file': 'test_native_functions'}, {'test_file': 'test_native_mha'}, {'test_file': 'test_numba_integration'}, {'test_file': 'test_numpy_interop'}, {'test_file': 'test_openmp'}, {'test_file': 'test_openreg'}, {'test_file': 'test_ops_fwd_gradients'}, {'test_file': 'test_optim'}, {'test_file': 'test_out_dtype_op'}, {'test_file': 'test_package'}, {'test_file': 'test_per_overload_api'}, {'test_file': 'test_prims'}, {'test_file': 'test_pruning_op'}, {'test_file': 'test_python_dispatch'}, {'test_file': 'test_pytree'}, {'test_file': 'test_rename_privateuse1_to_existing_device'}, {'test_file': 'test_scatter_gather_ops'}, {'test_file': 'test_schema_check'}, {'test_file': 'test_segment_reductions'}, {'test_file': 'test_serialization'}, {'test_file': 'test_set_default_mobile_cpu_allocator'}, {'test_file': 'test_shape_ops'}, {'test_file': 'test_show_pickle'}, {'test_file': 'test_sort_and_select'}, {'test_file': 'test_sparse_csr'}, {'test_file': 'test_spectral_ops'}, {'test_file': 'test_stateless'}, {'test_file': 'test_subclass'}, {'test_file': 'test_sympy_utils'}, {'test_file': 'test_tensor_creation_ops'}, {'test_file': 'test_tensorexpr'}, {'test_file': 'test_tensorexpr_pybind'}, {'test_file': 'test_transformers'}, {'test_file': 'test_type_info'}, {'test_file': 'test_type_promotion'}, {'test_file': 'test_typing'}, {'test_file': 'test_unary_ufuncs'}, {'test_file': 'test_utils'}, {'test_file': 'test_utils_config_module'}, {'test_file': 'test_utils_filelock'}, {'test_file': 'test_view_ops'}, {'test_file': 'test_vulkan'}, {'test_file': 'test_weak'}, {'test_file': 'test_xnnpack_integration'}, {'test_file': 'torch_np/numpy_tests/core/test_dlpack'}, {'test_file': 'torch_np/numpy_tests/core/test_dtype'}, {'test_file': 'torch_np/numpy_tests/core/test_einsum'}, {'test_file': 'torch_np/numpy_tests/core/test_getlimits'}, {'test_file': 'torch_np/numpy_tests/core/test_indexing'}, {'test_file': 'torch_np/numpy_tests/core/test_multiarray'}, {'test_file': 'torch_np/numpy_tests/core/test_numeric'}, {'test_file': 'torch_np/numpy_tests/core/test_numerictypes'}, {'test_file': 'torch_np/numpy_tests/core/test_scalar_ctors'}, {'test_file': 'torch_np/numpy_tests/core/test_scalar_methods'}, {'test_file': 'torch_np/numpy_tests/core/test_scalarinherit'}, {'test_file': 'torch_np/numpy_tests/core/test_scalarmath'}, {'test_file': 'torch_np/numpy_tests/core/test_shape_base'}, {'test_file': 'torch_np/numpy_tests/fft/test_helper'}, {'test_file': 'torch_np/numpy_tests/fft/test_pocketfft'}, {'test_file': 'torch_np/numpy_tests/lib/test_arraypad'}, {'test_file': 'torch_np/numpy_tests/lib/test_arraysetops'}, {'test_file': 'torch_np/numpy_tests/lib/test_function_base'}, {'test_file': 'torch_np/numpy_tests/lib/test_histograms'}, {'test_file': 'torch_np/numpy_tests/lib/test_index_tricks'}, {'test_file': 'torch_np/numpy_tests/lib/test_shape_base_'}, {'test_file': 'torch_np/numpy_tests/lib/test_twodim_base'}, {'test_file': 'torch_np/numpy_tests/lib/test_type_check'}, {'test_file': 'torch_np/numpy_tests/linalg/test_linalg'}, {'test_file': 'torch_np/test_basic'}, {'test_file': 'torch_np/test_binary_ufuncs'}, {'test_file': 'torch_np/test_dtype'}, {'test_file': 'torch_np/test_function_base'}, {'test_file': 'torch_np/test_indexing'}, {'test_file': 'torch_np/test_ndarray_methods'}, {'test_file': 'torch_np/test_nep50_examples'}, {'test_file': 'torch_np/test_random'}, {'test_file': 'torch_np/test_reductions'}, {'test_file': 'torch_np/test_scalars_0D_arrays'}, {'test_file': 'torch_np/test_ufuncs_basic'}, {'test_file': 'torch_np/test_unary_ufuncs'}, {'test_file': 'typing/test_python_operators'}, {'test_file': 'xpu/test_conv'}, {'test_file': 'xpu/test_fusion'}, {'test_file': 'xpu/test_gemm'}], 'excluded': []} from test/test-reports/td_exclusions-f90b83a2c9c0b3ebea5a.json is not a benchmark record, skipping 2025-08-14T23:46:14.3980289Z warn(f"{result} from {filepath} is not a benchmark record, skipping") 2025-08-14T23:46:14.4074431Z Prepare all required actions 2025-08-14T23:46:14.4075262Z Getting action download info 2025-08-14T23:46:14.4124397Z ##[group]Run ./.github/actions/teardown-rocm 2025-08-14T23:46:14.4124790Z env: 2025-08-14T23:46:14.4125240Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:14.4126027Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:14.4127404Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:14.4128423Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:14.4130169Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:14.4131956Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:14.4132618Z AWS_REGION: us-east-1 2025-08-14T23:46:14.4133331Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:14.4134253Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:14.4145384Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:14.4146198Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:14.4147089Z DEVICE_NAME: rocm 2025-08-14T23:46:14.4147601Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:14.4148165Z ##[endgroup] 2025-08-14T23:46:14.4179973Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-08-14T23:46:14.4181005Z # ignore expansion of "docker ps -q" since it could be empty 2025-08-14T23:46:14.4181825Z # shellcheck disable=SC2046 2025-08-14T23:46:14.4182496Z docker stop $(docker ps -q) || true 2025-08-14T23:46:14.4183184Z # Prune all stopped containers. 2025-08-14T23:46:14.4183857Z docker container prune -f 2025-08-14T23:46:14.4231423Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:14.4232162Z env: 2025-08-14T23:46:14.4232603Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:14.4233417Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:14.4234598Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:14.4235646Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:14.4237395Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:14.4238989Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:14.4239551Z AWS_REGION: us-east-1 2025-08-14T23:46:14.4240175Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:14.4241082Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:14.4252151Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:14.4253063Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:14.4254105Z DEVICE_NAME: rocm 2025-08-14T23:46:14.4254722Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:14.4255375Z ##[endgroup] 2025-08-14T23:46:25.3655559Z 3d9c9b40c40d 2025-08-14T23:46:56.0250745Z Deleted Containers: 2025-08-14T23:46:56.0251777Z 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:56.0255791Z 2025-08-14T23:46:56.0256451Z Total reclaimed space: 12.2GB 2025-08-14T23:46:56.0340057Z Prepare all required actions 2025-08-14T23:46:56.0392041Z ##[group]Run ./.github/actions/diskspace-cleanup 2025-08-14T23:46:56.0392656Z with: 2025-08-14T23:46:56.0393082Z diskspace-cutoff: 70 2025-08-14T23:46:56.0393542Z env: 2025-08-14T23:46:56.0393950Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:56.0394707Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:56.0396052Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:56.0397026Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:56.0399049Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:56.0400666Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:56.0401217Z AWS_REGION: us-east-1 2025-08-14T23:46:56.0401838Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:56.0402774Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:56.0413202Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:56.0413974Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:56.0414776Z DEVICE_NAME: rocm 2025-08-14T23:46:56.0415254Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:56.0415768Z ##[endgroup] 2025-08-14T23:46:56.0443617Z ##[group]Run set -ex 2025-08-14T23:46:56.0444119Z set -ex 2025-08-14T23:46:56.0444574Z diskspace_cutoff=70 2025-08-14T23:46:56.0445250Z docker_root_dir=$(docker info -f '{{.DockerRootDir}}') 2025-08-14T23:46:56.0446004Z if [ ! -d "$docker_root_dir" ]; then 2025-08-14T23:46:56.0446916Z  echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check." 2025-08-14T23:46:56.0447783Z  exit 0 2025-08-14T23:46:56.0448223Z fi 2025-08-14T23:46:56.0448966Z diskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-08-14T23:46:56.0450478Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-08-14T23:46:56.0451817Z if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then 2025-08-14T23:46:56.0452482Z  docker system prune -af 2025-08-14T23:46:56.0453385Z  diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-08-14T23:46:56.0454392Z  if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then 2025-08-14T23:46:56.0455470Z  echo "Error: Available diskspace is less than $diskspace_cutoff percent. Not enough diskspace." 2025-08-14T23:46:56.0456399Z  echo "$msg" 2025-08-14T23:46:56.0456894Z  exit 1 2025-08-14T23:46:56.0457345Z  else 2025-08-14T23:46:56.0457874Z  difference=$((diskspace - diskspace_new)) 2025-08-14T23:46:56.0458608Z  echo "Diskspace saved: $difference percent" 2025-08-14T23:46:56.0459213Z  fi 2025-08-14T23:46:56.0459620Z fi 2025-08-14T23:46:56.0507694Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-08-14T23:46:56.0508386Z env: 2025-08-14T23:46:56.0508810Z GIT_DEFAULT_BRANCH: main 2025-08-14T23:46:56.0509548Z RUNNER_ARTIFACT_DIR: /home/pytorchci/actions-runner/_work/_temp/artifacts 2025-08-14T23:46:56.0510608Z RUNNER_TEST_RESULTS_DIR: /home/pytorchci/actions-runner/_work/_temp/test-results 2025-08-14T23:46:56.0511585Z RUNNER_DOCS_DIR: /home/pytorchci/actions-runner/_work/_temp/docs 2025-08-14T23:46:56.0513223Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --device /dev/dri --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-08-14T23:46:56.0514689Z AWS_DEFAULT_REGION: us-east-1 2025-08-14T23:46:56.0515208Z AWS_REGION: us-east-1 2025-08-14T23:46:56.0515775Z AWS_ACCESS_KEY_ID: *** 2025-08-14T23:46:56.0516457Z AWS_SECRET_ACCESS_KEY: *** 2025-08-14T23:46:56.0526966Z AWS_SESSION_TOKEN: *** 2025-08-14T23:46:56.0527732Z CONTAINER_NAME: 3d9c9b40c40dc5f1fb23a7678a8237a367285d971448e6116aec4b368a1c9e14 2025-08-14T23:46:56.0528540Z DEVICE_NAME: rocm 2025-08-14T23:46:56.0529010Z DEVICE_TYPE: AMD Instinct MI210 2025-08-14T23:46:56.0529746Z ##[endgroup] 2025-08-14T23:46:56.0601263Z + diskspace_cutoff=70 2025-08-14T23:46:56.0608210Z ++ docker info -f '{{.DockerRootDir}}' 2025-08-14T23:46:56.1228461Z + docker_root_dir=/home/pytorchci/.local/share/docker 2025-08-14T23:46:56.1229542Z + '[' '!' -d /home/pytorchci/.local/share/docker ']' 2025-08-14T23:46:56.1237284Z ++ df -H --output=pcent /home/pytorchci/.local/share/docker 2025-08-14T23:46:56.1237926Z ++ sed -n 2p 2025-08-14T23:46:56.1238490Z ++ sed 's/ //' 2025-08-14T23:46:56.1238798Z ++ sed s/%// 2025-08-14T23:46:56.1262525Z + diskspace=61 2025-08-14T23:46:56.1263230Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified' 2025-08-14T23:46:56.1264137Z + [[ 61 -ge 70 ]] 2025-08-14T23:46:56.1307417Z Post job cleanup. 2025-08-14T23:46:56.1347145Z Post job cleanup. 2025-08-14T23:46:56.2612440Z Post job cleanup. 2025-08-14T23:46:56.3093970Z Logging out of registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-08-14T23:46:56.3435530Z Post job cleanup. 2025-08-14T23:46:56.4709108Z Post job cleanup. 2025-08-14T23:46:56.4794683Z Post job cleanup. 2025-08-14T23:46:56.5739203Z [command]/usr/bin/git version 2025-08-14T23:46:56.5781387Z git version 2.34.1 2025-08-14T23:46:56.5816152Z Copying '/home/pytorchci/.gitconfig' to '/home/pytorchci/actions-runner/_work/_temp/601952ce-ace2-42ab-b731-71ef11c8c324/.gitconfig' 2025-08-14T23:46:56.5827508Z Temporarily overriding HOME='/home/pytorchci/actions-runner/_work/_temp/601952ce-ace2-42ab-b731-71ef11c8c324' before making global git config changes 2025-08-14T23:46:56.5829720Z Adding repository directory to the temporary git global config as a safe directory 2025-08-14T23:46:56.5831280Z [command]/usr/bin/git config --global --add safe.directory /home/pytorchci/actions-runner/_work/pytorch/pytorch 2025-08-14T23:46:56.5867031Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-08-14T23:46:56.5909997Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-08-14T23:46:56.6277781Z Entering 'android/libs/fbjni' 2025-08-14T23:46:56.6337879Z Entering 'third_party/FP16' 2025-08-14T23:46:56.6385402Z Entering 'third_party/FXdiv' 2025-08-14T23:46:56.6424636Z Entering 'third_party/NNPACK' 2025-08-14T23:46:56.6463259Z Entering 'third_party/NVTX' 2025-08-14T23:46:56.6502057Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T23:46:56.6540172Z Entering 'third_party/XNNPACK' 2025-08-14T23:46:56.6605691Z Entering 'third_party/aiter' 2025-08-14T23:46:56.6652135Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T23:46:56.6724165Z Entering 'third_party/benchmark' 2025-08-14T23:46:56.6780537Z Entering 'third_party/composable_kernel' 2025-08-14T23:46:56.6840809Z Entering 'third_party/cpp-httplib' 2025-08-14T23:46:56.6896582Z Entering 'third_party/cpuinfo' 2025-08-14T23:46:56.6941846Z Entering 'third_party/cudnn_frontend' 2025-08-14T23:46:56.6992208Z Entering 'third_party/cutlass' 2025-08-14T23:46:56.7049762Z Entering 'third_party/fbgemm' 2025-08-14T23:46:56.7094387Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T23:46:56.7142749Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T23:46:56.7223941Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T23:46:56.7276907Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T23:46:56.7327898Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T23:46:56.7390100Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T23:46:56.7445838Z Entering 'third_party/fbgemm/external/json' 2025-08-14T23:46:56.7518789Z Entering 'third_party/flash-attention' 2025-08-14T23:46:56.7596671Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T23:46:56.7652111Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T23:46:56.7709884Z Entering 'third_party/flatbuffers' 2025-08-14T23:46:56.7763174Z Entering 'third_party/fmt' 2025-08-14T23:46:56.7810804Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T23:46:56.7867966Z Entering 'third_party/gloo' 2025-08-14T23:46:56.7925842Z Entering 'third_party/googletest' 2025-08-14T23:46:56.7986875Z Entering 'third_party/ideep' 2025-08-14T23:46:56.8045663Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T23:46:56.8116230Z Entering 'third_party/ittapi' 2025-08-14T23:46:56.8176188Z Entering 'third_party/kineto' 2025-08-14T23:46:56.8230761Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T23:46:56.8289352Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T23:46:56.8351383Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T23:46:56.8405776Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T23:46:56.8452820Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T23:46:56.8508582Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T23:46:56.8570680Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T23:46:56.8610621Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T23:46:56.8647437Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T23:46:56.8693496Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T23:46:56.8749369Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T23:46:56.8805216Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T23:46:56.8859655Z Entering 'third_party/kleidiai' 2025-08-14T23:46:56.8927221Z Entering 'third_party/mimalloc' 2025-08-14T23:46:56.8979744Z Entering 'third_party/nlohmann' 2025-08-14T23:46:56.9040104Z Entering 'third_party/onnx' 2025-08-14T23:46:56.9131086Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T23:46:56.9187310Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T23:46:56.9252027Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T23:46:56.9311242Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T23:46:56.9361908Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T23:46:56.9413256Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T23:46:56.9450734Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T23:46:56.9486987Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T23:46:56.9522825Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T23:46:56.9572232Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T23:46:56.9624550Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T23:46:56.9682334Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T23:46:56.9746604Z Entering 'third_party/pocketfft' 2025-08-14T23:46:56.9810051Z Entering 'third_party/protobuf' 2025-08-14T23:46:56.9856603Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T23:46:56.9902987Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T23:46:56.9974603Z Entering 'third_party/psimd' 2025-08-14T23:46:57.0025444Z Entering 'third_party/pthreadpool' 2025-08-14T23:46:57.0064469Z Entering 'third_party/pybind11' 2025-08-14T23:46:57.0106433Z Entering 'third_party/python-peachpy' 2025-08-14T23:46:57.0161964Z Entering 'third_party/sleef' 2025-08-14T23:46:57.0210850Z Entering 'third_party/tensorpipe' 2025-08-14T23:46:57.0253545Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T23:46:57.0311727Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T23:46:57.0364159Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T23:46:57.0416012Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T23:46:57.0468483Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T23:46:57.0564527Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-08-14T23:46:57.0610098Z http.https://github.com/.extraheader 2025-08-14T23:46:57.0626384Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-08-14T23:46:57.0686022Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-08-14T23:46:57.1058334Z Entering 'android/libs/fbjni' 2025-08-14T23:46:57.1089413Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1143975Z Entering 'third_party/FP16' 2025-08-14T23:46:57.1182345Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1231728Z Entering 'third_party/FXdiv' 2025-08-14T23:46:57.1271972Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1321333Z Entering 'third_party/NNPACK' 2025-08-14T23:46:57.1370303Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1401162Z Entering 'third_party/NVTX' 2025-08-14T23:46:57.1428605Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1469026Z Entering 'third_party/VulkanMemoryAllocator' 2025-08-14T23:46:57.1500491Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1543281Z Entering 'third_party/XNNPACK' 2025-08-14T23:46:57.1573880Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1645982Z Entering 'third_party/aiter' 2025-08-14T23:46:57.1679627Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1730015Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-08-14T23:46:57.1765708Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1808684Z Entering 'third_party/benchmark' 2025-08-14T23:46:57.1832831Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1867382Z Entering 'third_party/composable_kernel' 2025-08-14T23:46:57.1896851Z http.https://github.com/.extraheader 2025-08-14T23:46:57.1948654Z Entering 'third_party/cpp-httplib' 2025-08-14T23:46:57.1979593Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2015163Z Entering 'third_party/cpuinfo' 2025-08-14T23:46:57.2037968Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2079874Z Entering 'third_party/cudnn_frontend' 2025-08-14T23:46:57.2113882Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2155430Z Entering 'third_party/cutlass' 2025-08-14T23:46:57.2183531Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2226453Z Entering 'third_party/fbgemm' 2025-08-14T23:46:57.2248845Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2291240Z Entering 'third_party/fbgemm/external/asmjit' 2025-08-14T23:46:57.2313651Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2348829Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-08-14T23:46:57.2375672Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2418779Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-08-14T23:46:57.2445599Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2477505Z Entering 'third_party/fbgemm/external/cutlass' 2025-08-14T23:46:57.2501541Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2548692Z Entering 'third_party/fbgemm/external/googletest' 2025-08-14T23:46:57.2573424Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2616937Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-08-14T23:46:57.2640709Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2679089Z Entering 'third_party/fbgemm/external/json' 2025-08-14T23:46:57.2703318Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2747169Z Entering 'third_party/flash-attention' 2025-08-14T23:46:57.2778190Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2820729Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-08-14T23:46:57.2847028Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2885393Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-08-14T23:46:57.2904683Z http.https://github.com/.extraheader 2025-08-14T23:46:57.2949019Z Entering 'third_party/flatbuffers' 2025-08-14T23:46:57.2977342Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3018787Z Entering 'third_party/fmt' 2025-08-14T23:46:57.3040614Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3088153Z Entering 'third_party/gemmlowp/gemmlowp' 2025-08-14T23:46:57.3115074Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3150554Z Entering 'third_party/gloo' 2025-08-14T23:46:57.3175087Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3224942Z Entering 'third_party/googletest' 2025-08-14T23:46:57.3253684Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3299531Z Entering 'third_party/ideep' 2025-08-14T23:46:57.3333412Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3385537Z Entering 'third_party/ideep/mkl-dnn' 2025-08-14T23:46:57.3415227Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3455479Z Entering 'third_party/ittapi' 2025-08-14T23:46:57.3476872Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3531355Z Entering 'third_party/kineto' 2025-08-14T23:46:57.3559556Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3595589Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-08-14T23:46:57.3623856Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3686502Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-08-14T23:46:57.3714085Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3751919Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-08-14T23:46:57.3781640Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3833142Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-08-14T23:46:57.3862549Z http.https://github.com/.extraheader 2025-08-14T23:46:57.3921405Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-08-14T23:46:57.3973662Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4025131Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-08-14T23:46:57.4062163Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4128996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-08-14T23:46:57.4155304Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4209766Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-08-14T23:46:57.4235385Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4288979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-08-14T23:46:57.4315460Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4357997Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-08-14T23:46:57.4382717Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4419109Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-08-14T23:46:57.4448222Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4491301Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-08-14T23:46:57.4512537Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4569845Z Entering 'third_party/kleidiai' 2025-08-14T23:46:57.4597802Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4658991Z Entering 'third_party/mimalloc' 2025-08-14T23:46:57.4692167Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4737765Z Entering 'third_party/nlohmann' 2025-08-14T23:46:57.4774643Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4819525Z Entering 'third_party/onnx' 2025-08-14T23:46:57.4854110Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4900905Z Entering 'third_party/onnx/third_party/pybind11' 2025-08-14T23:46:57.4933946Z http.https://github.com/.extraheader 2025-08-14T23:46:57.4973300Z Entering 'third_party/opentelemetry-cpp' 2025-08-14T23:46:57.4998382Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5051458Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-08-14T23:46:57.5087059Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5128668Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-08-14T23:46:57.5154747Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5189952Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-08-14T23:46:57.5218820Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5270047Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-08-14T23:46:57.5310090Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5348758Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-08-14T23:46:57.5372050Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5411778Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-08-14T23:46:57.5433109Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5462331Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-08-14T23:46:57.5482823Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5524970Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-08-14T23:46:57.5565345Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5619413Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-08-14T23:46:57.5657128Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5705153Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-08-14T23:46:57.5739305Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5811303Z Entering 'third_party/pocketfft' 2025-08-14T23:46:57.5844474Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5885042Z Entering 'third_party/protobuf' 2025-08-14T23:46:57.5943863Z http.https://github.com/.extraheader 2025-08-14T23:46:57.5983813Z Entering 'third_party/protobuf/third_party/benchmark' 2025-08-14T23:46:57.6018781Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6075480Z Entering 'third_party/protobuf/third_party/googletest' 2025-08-14T23:46:57.6104146Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6151125Z Entering 'third_party/psimd' 2025-08-14T23:46:57.6180248Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6229147Z Entering 'third_party/pthreadpool' 2025-08-14T23:46:57.6264962Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6305757Z Entering 'third_party/pybind11' 2025-08-14T23:46:57.6338171Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6384412Z Entering 'third_party/python-peachpy' 2025-08-14T23:46:57.6414074Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6458974Z Entering 'third_party/sleef' 2025-08-14T23:46:57.6483595Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6541060Z Entering 'third_party/tensorpipe' 2025-08-14T23:46:57.6577734Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6634277Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-08-14T23:46:57.6670759Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6711586Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-08-14T23:46:57.6735370Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6777557Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-08-14T23:46:57.6803469Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6838516Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-08-14T23:46:57.6862689Z http.https://github.com/.extraheader 2025-08-14T23:46:57.6900586Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-08-14T23:46:57.6932690Z http.https://github.com/.extraheader 2025-08-14T23:46:57.7203034Z Cleaning up orphan processes